TREDE and VMPOP: Cultivating multi-purpose datasets for digital forensics – A Windows registry corpus as an example

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


The demand is rising for publicly available datasets to support studying emerging technologies, performing tool testing, detecting incorrect implementations, and also ensuring the reliability of security and digital forensics related knowledge. While a variety of data is being created on a day-to-day basis in; security, forensics and incident response labs, the created data is often not practical to use or has other limitations. In this situation, a variety of researchers, practitioners and research projects have released valuable datasets acquired from computer systems or digital devices used by actual users or are generated during research activities. Nevertheless, there is still a significant lack of reference data for supporting a range of purposes, and there is also a need to increase the number of publicly available testbeds as well as to improve verifiability as ‘reference’ data. Although existing datasets are useful and valuable, some of them have critical limitations on the verifiability if they are acquired or created without ground truth data. This paper introduces a practical methodology to develop synthetic reference datasets in the field of security and digital forensics. This work's proposal divides the steps for generating a synthetic corpus into two different classes: user-generated and system-generated reference data. In addition, this paper presents a novel framework to assist the development of system-generated data along with a virtualization system and elaborate automated virtual machine control, and then proceeds to perform a proof-of-concept implementation. Finally, this work demonstrates that the proposed concepts are feasible and effective through practical deployment and then evaluate its potential values.

Original languageEnglish
Pages (from-to)3-18
Number of pages16
JournalDigital Investigation
Publication statusPublished - 2018 Sep
Externally publishedYes


  • Automated data generation
  • Data corpora
  • Dataset
  • Forensic infrastructure
  • Reference data
  • Synthetic data

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Information Systems
  • Computer Science Applications
  • Medical Laboratory Technology
  • Law


Dive into the research topics of 'TREDE and VMPOP: Cultivating multi-purpose datasets for digital forensics – A Windows registry corpus as an example'. Together they form a unique fingerprint.

Cite this