TY - JOUR
T1 - Relevance analysis using revision identifier in MS word
AU - Joun, Jihun
AU - Chung, Hyunji
AU - Park, Jungheum
AU - Lee, Sangjin
N1 - Funding Information:
This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-01000, Development of Digital Forensic Integration Platform).
Funding Information:
This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018‐0‐01000, Development of Digital Forensic Integration Platform).
Publisher Copyright:
© 2020 American Academy of Forensic Sciences
PY - 2021/1
Y1 - 2021/1
N2 - Electronic documents often contain personal or confidential information, which can be used as valuable evidence in criminal investigations. In the digital investigation, special techniques are required for grouping and screening electronic documents, because it is challenging to analyze relationships between numerous documents in storage devices manually. To this end, although techniques such as keyword search, similarity search, topic modeling, metadata analysis, and document clustering are continually being studied, there are still limitations for revealing the relevance of documents. Specifically, metadata used in previous research are not always values present in the documents, and clustering methods with specific keywords may be incomplete because text-based contents (including metadata) can be easily modified or deleted by users. In this work, we propose a novel method to efficiently group Microsoft Office Word 2007+ (MS Word) files by using revision identifier (RSID). Through a thorough understanding of the RSID, examiners can predict organizations to which a specific user belongs, and further, it is likely to discover unexpected interpersonal relationships. An experiment with a public dataset (GovDocs) provides that it is possible to categorize documents more effectively by combining our proposal with previously studied methods. Furthermore, we introduce a new document tracking method to understand the editing history and movement of a file, and then demonstrate its usefulness through an experiment with documents from a real case.
AB - Electronic documents often contain personal or confidential information, which can be used as valuable evidence in criminal investigations. In the digital investigation, special techniques are required for grouping and screening electronic documents, because it is challenging to analyze relationships between numerous documents in storage devices manually. To this end, although techniques such as keyword search, similarity search, topic modeling, metadata analysis, and document clustering are continually being studied, there are still limitations for revealing the relevance of documents. Specifically, metadata used in previous research are not always values present in the documents, and clustering methods with specific keywords may be incomplete because text-based contents (including metadata) can be easily modified or deleted by users. In this work, we propose a novel method to efficiently group Microsoft Office Word 2007+ (MS Word) files by using revision identifier (RSID). Through a thorough understanding of the RSID, examiners can predict organizations to which a specific user belongs, and further, it is likely to discover unexpected interpersonal relationships. An experiment with a public dataset (GovDocs) provides that it is possible to categorize documents more effectively by combining our proposal with previously studied methods. Furthermore, we introduce a new document tracking method to understand the editing history and movement of a file, and then demonstrate its usefulness through an experiment with documents from a real case.
KW - MS word
KW - OOXML
KW - RSID
KW - document forensics
KW - document grouping
KW - document relationships
KW - relevance analysis
KW - revision identifier
UR - http://www.scopus.com/inward/record.url?scp=85091839088&partnerID=8YFLogxK
U2 - 10.1111/1556-4029.14584
DO - 10.1111/1556-4029.14584
M3 - Article
C2 - 33006782
AN - SCOPUS:85091839088
VL - 66
SP - 323
EP - 335
JO - Journal of Forensic Sciences
JF - Journal of Forensic Sciences
SN - 0022-1198
IS - 1
ER -