Detecting similar files based on hash and statistical analysis for digital forensic investigation

Kimin Seo, Kyungsoo Lim, Jaemin Choi, Kisik Chang, Sangjin Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In modern society, rapid increase in using mass storage devices, and it makes forensic examiners find important evidence hardly in the focus of time-consuming. Examiners spend much time to search files related to the case in variety of storage devices. Recently, NIST(National Institute of Standards and Technology) has developed a new database, called NSRL(National Software Reference Library), which contains hash values of trusted operating systems and programs[1]. As establishing this database service in public, NIST contribute to reduce time-consuming in searching file and detecting forgery on the devices. On the other hand, the hash value based detection technique cannot be distinguished the similarity from other files perfectly. In this paper, therefore, we present novel methods for detecting similar files considering the known fuzzy hashing and statistical analysis and developed out prototype tool, called SimFD.

Original languageEnglish
Title of host publicationProceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009
DOIs
Publication statusPublished - 2009 Dec 1
Event2009 2nd International Conference on Computer Science and Its Applications, CSA 2009 - Jeju Island, Korea, Republic of
Duration: 2009 Dec 102009 Dec 12

Other

Other2009 2nd International Conference on Computer Science and Its Applications, CSA 2009
CountryKorea, Republic of
CityJeju Island
Period09/12/1009/12/12

Fingerprint

Statistical methods
Digital forensics

Keywords

  • Block-based hash
  • CTPH algorithm
  • Digital forensics
  • Hash
  • Similar files

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications

Cite this

Seo, K., Lim, K., Choi, J., Chang, K., & Lee, S. (2009). Detecting similar files based on hash and statistical analysis for digital forensic investigation. In Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009 [5404198] https://doi.org/10.1109/CSA.2009.5404198

Detecting similar files based on hash and statistical analysis for digital forensic investigation. / Seo, Kimin; Lim, Kyungsoo; Choi, Jaemin; Chang, Kisik; Lee, Sangjin.

Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009. 2009. 5404198.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Seo, K, Lim, K, Choi, J, Chang, K & Lee, S 2009, Detecting similar files based on hash and statistical analysis for digital forensic investigation. in Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009., 5404198, 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009, Jeju Island, Korea, Republic of, 09/12/10. https://doi.org/10.1109/CSA.2009.5404198
Seo K, Lim K, Choi J, Chang K, Lee S. Detecting similar files based on hash and statistical analysis for digital forensic investigation. In Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009. 2009. 5404198 https://doi.org/10.1109/CSA.2009.5404198
Seo, Kimin ; Lim, Kyungsoo ; Choi, Jaemin ; Chang, Kisik ; Lee, Sangjin. / Detecting similar files based on hash and statistical analysis for digital forensic investigation. Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009. 2009.
@inproceedings{e74361a6b0124e6e850c5af09682f439,
title = "Detecting similar files based on hash and statistical analysis for digital forensic investigation",
abstract = "In modern society, rapid increase in using mass storage devices, and it makes forensic examiners find important evidence hardly in the focus of time-consuming. Examiners spend much time to search files related to the case in variety of storage devices. Recently, NIST(National Institute of Standards and Technology) has developed a new database, called NSRL(National Software Reference Library), which contains hash values of trusted operating systems and programs[1]. As establishing this database service in public, NIST contribute to reduce time-consuming in searching file and detecting forgery on the devices. On the other hand, the hash value based detection technique cannot be distinguished the similarity from other files perfectly. In this paper, therefore, we present novel methods for detecting similar files considering the known fuzzy hashing and statistical analysis and developed out prototype tool, called SimFD.",
keywords = "Block-based hash, CTPH algorithm, Digital forensics, Hash, Similar files",
author = "Kimin Seo and Kyungsoo Lim and Jaemin Choi and Kisik Chang and Sangjin Lee",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/CSA.2009.5404198",
language = "English",
isbn = "9781424449460",
booktitle = "Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009",

}

TY - GEN

T1 - Detecting similar files based on hash and statistical analysis for digital forensic investigation

AU - Seo, Kimin

AU - Lim, Kyungsoo

AU - Choi, Jaemin

AU - Chang, Kisik

AU - Lee, Sangjin

PY - 2009/12/1

Y1 - 2009/12/1

N2 - In modern society, rapid increase in using mass storage devices, and it makes forensic examiners find important evidence hardly in the focus of time-consuming. Examiners spend much time to search files related to the case in variety of storage devices. Recently, NIST(National Institute of Standards and Technology) has developed a new database, called NSRL(National Software Reference Library), which contains hash values of trusted operating systems and programs[1]. As establishing this database service in public, NIST contribute to reduce time-consuming in searching file and detecting forgery on the devices. On the other hand, the hash value based detection technique cannot be distinguished the similarity from other files perfectly. In this paper, therefore, we present novel methods for detecting similar files considering the known fuzzy hashing and statistical analysis and developed out prototype tool, called SimFD.

AB - In modern society, rapid increase in using mass storage devices, and it makes forensic examiners find important evidence hardly in the focus of time-consuming. Examiners spend much time to search files related to the case in variety of storage devices. Recently, NIST(National Institute of Standards and Technology) has developed a new database, called NSRL(National Software Reference Library), which contains hash values of trusted operating systems and programs[1]. As establishing this database service in public, NIST contribute to reduce time-consuming in searching file and detecting forgery on the devices. On the other hand, the hash value based detection technique cannot be distinguished the similarity from other files perfectly. In this paper, therefore, we present novel methods for detecting similar files considering the known fuzzy hashing and statistical analysis and developed out prototype tool, called SimFD.

KW - Block-based hash

KW - CTPH algorithm

KW - Digital forensics

KW - Hash

KW - Similar files

UR - http://www.scopus.com/inward/record.url?scp=80655148030&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80655148030&partnerID=8YFLogxK

U2 - 10.1109/CSA.2009.5404198

DO - 10.1109/CSA.2009.5404198

M3 - Conference contribution

AN - SCOPUS:80655148030

SN - 9781424449460

BT - Proceedings of the 2009 2nd International Conference on Computer Science and Its Applications, CSA 2009

ER -