Privacy-preserving data cube for electronic medical records

An experimental evaluation

Soohyung Kim, Hyukki Lee, Yon Dohn Chung

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Introduction : The aim of this study is to evaluate the effectiveness and efficiency of privacy-preserving data cubes of electronic medical records (EMRs). An EMR data cube is a complex of EMR statistics that are summarized or aggregated by all possible combinations of attributes. Data cubes are widely utilized for efficient big data analysis and also have great potential for EMR analysis. For safe data analysis without privacy breaches, we must consider the privacy preservation characteristics of the EMR data cube. In this paper, we introduce a design for a privacy-preserving EMR data cube and the anonymization methods needed to achieve data privacy. We further focus on changes in efficiency and effectiveness that are caused by the anonymization process for privacy preservation. Thus, we experimentally evaluate various types of privacy-preserving EMR data cubes using several practical metrics and discuss the applicability of each anonymization method with consideration for the EMR analysis environment. Methods : We construct privacy-preserving EMR data cubes from anonymized EMR datasets. A real EMR dataset and demographic dataset are used for the evaluation. There are a large number of anonymization methods to preserve EMR privacy, and the methods are classified into three categories (i.e., global generalization, local generalization, and bucketization) by anonymization rules. According to this classification, three types of privacy-preserving EMR data cubes were constructed for the evaluation. We perform a comparative analysis by measuring the data size, cell overlap, and information loss of the EMR data cubes. Results : Global generalization considerably reduced the size of the EMR data cube and did not cause the data cube cells to overlap, but incurred a large amount of information loss. Local generalization maintained the data size and generated only moderate information loss, but there were cell overlaps that could decrease the search performance. Bucketization did not cause cells to overlap and generated little information loss; however, the method considerably inflated the size of the EMR data cubes. Conclusions : The utility of anonymized EMR data cubes varies widely according to the anonymization method, and the applicability of the anonymization method depends on the features of the EMR analysis environment. The findings help to adopt the optimal anonymization method considering the EMR analysis environment and goal of the EMR analysis.

Original languageEnglish
Pages (from-to)33-42
Number of pages10
JournalInternational Journal of Medical Informatics
Volume97
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Electronic Health Records
Privacy

Keywords

  • Anonymization
  • Data cube
  • Electronic medical records
  • Medical privacy

ASJC Scopus subject areas

  • Health Informatics

Cite this

Privacy-preserving data cube for electronic medical records : An experimental evaluation. / Kim, Soohyung; Lee, Hyukki; Chung, Yon Dohn.

In: International Journal of Medical Informatics, Vol. 97, 01.01.2017, p. 33-42.

Research output: Contribution to journalArticle

@article{d0b3465af99f4092a832d80f7adda7dd,
title = "Privacy-preserving data cube for electronic medical records: An experimental evaluation",
abstract = "Introduction : The aim of this study is to evaluate the effectiveness and efficiency of privacy-preserving data cubes of electronic medical records (EMRs). An EMR data cube is a complex of EMR statistics that are summarized or aggregated by all possible combinations of attributes. Data cubes are widely utilized for efficient big data analysis and also have great potential for EMR analysis. For safe data analysis without privacy breaches, we must consider the privacy preservation characteristics of the EMR data cube. In this paper, we introduce a design for a privacy-preserving EMR data cube and the anonymization methods needed to achieve data privacy. We further focus on changes in efficiency and effectiveness that are caused by the anonymization process for privacy preservation. Thus, we experimentally evaluate various types of privacy-preserving EMR data cubes using several practical metrics and discuss the applicability of each anonymization method with consideration for the EMR analysis environment. Methods : We construct privacy-preserving EMR data cubes from anonymized EMR datasets. A real EMR dataset and demographic dataset are used for the evaluation. There are a large number of anonymization methods to preserve EMR privacy, and the methods are classified into three categories (i.e., global generalization, local generalization, and bucketization) by anonymization rules. According to this classification, three types of privacy-preserving EMR data cubes were constructed for the evaluation. We perform a comparative analysis by measuring the data size, cell overlap, and information loss of the EMR data cubes. Results : Global generalization considerably reduced the size of the EMR data cube and did not cause the data cube cells to overlap, but incurred a large amount of information loss. Local generalization maintained the data size and generated only moderate information loss, but there were cell overlaps that could decrease the search performance. Bucketization did not cause cells to overlap and generated little information loss; however, the method considerably inflated the size of the EMR data cubes. Conclusions : The utility of anonymized EMR data cubes varies widely according to the anonymization method, and the applicability of the anonymization method depends on the features of the EMR analysis environment. The findings help to adopt the optimal anonymization method considering the EMR analysis environment and goal of the EMR analysis.",
keywords = "Anonymization, Data cube, Electronic medical records, Medical privacy",
author = "Soohyung Kim and Hyukki Lee and Chung, {Yon Dohn}",
year = "2017",
month = "1",
day = "1",
doi = "10.1016/j.ijmedinf.2016.09.008",
language = "English",
volume = "97",
pages = "33--42",
journal = "International Journal of Medical Informatics",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Privacy-preserving data cube for electronic medical records

T2 - An experimental evaluation

AU - Kim, Soohyung

AU - Lee, Hyukki

AU - Chung, Yon Dohn

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Introduction : The aim of this study is to evaluate the effectiveness and efficiency of privacy-preserving data cubes of electronic medical records (EMRs). An EMR data cube is a complex of EMR statistics that are summarized or aggregated by all possible combinations of attributes. Data cubes are widely utilized for efficient big data analysis and also have great potential for EMR analysis. For safe data analysis without privacy breaches, we must consider the privacy preservation characteristics of the EMR data cube. In this paper, we introduce a design for a privacy-preserving EMR data cube and the anonymization methods needed to achieve data privacy. We further focus on changes in efficiency and effectiveness that are caused by the anonymization process for privacy preservation. Thus, we experimentally evaluate various types of privacy-preserving EMR data cubes using several practical metrics and discuss the applicability of each anonymization method with consideration for the EMR analysis environment. Methods : We construct privacy-preserving EMR data cubes from anonymized EMR datasets. A real EMR dataset and demographic dataset are used for the evaluation. There are a large number of anonymization methods to preserve EMR privacy, and the methods are classified into three categories (i.e., global generalization, local generalization, and bucketization) by anonymization rules. According to this classification, three types of privacy-preserving EMR data cubes were constructed for the evaluation. We perform a comparative analysis by measuring the data size, cell overlap, and information loss of the EMR data cubes. Results : Global generalization considerably reduced the size of the EMR data cube and did not cause the data cube cells to overlap, but incurred a large amount of information loss. Local generalization maintained the data size and generated only moderate information loss, but there were cell overlaps that could decrease the search performance. Bucketization did not cause cells to overlap and generated little information loss; however, the method considerably inflated the size of the EMR data cubes. Conclusions : The utility of anonymized EMR data cubes varies widely according to the anonymization method, and the applicability of the anonymization method depends on the features of the EMR analysis environment. The findings help to adopt the optimal anonymization method considering the EMR analysis environment and goal of the EMR analysis.

AB - Introduction : The aim of this study is to evaluate the effectiveness and efficiency of privacy-preserving data cubes of electronic medical records (EMRs). An EMR data cube is a complex of EMR statistics that are summarized or aggregated by all possible combinations of attributes. Data cubes are widely utilized for efficient big data analysis and also have great potential for EMR analysis. For safe data analysis without privacy breaches, we must consider the privacy preservation characteristics of the EMR data cube. In this paper, we introduce a design for a privacy-preserving EMR data cube and the anonymization methods needed to achieve data privacy. We further focus on changes in efficiency and effectiveness that are caused by the anonymization process for privacy preservation. Thus, we experimentally evaluate various types of privacy-preserving EMR data cubes using several practical metrics and discuss the applicability of each anonymization method with consideration for the EMR analysis environment. Methods : We construct privacy-preserving EMR data cubes from anonymized EMR datasets. A real EMR dataset and demographic dataset are used for the evaluation. There are a large number of anonymization methods to preserve EMR privacy, and the methods are classified into three categories (i.e., global generalization, local generalization, and bucketization) by anonymization rules. According to this classification, three types of privacy-preserving EMR data cubes were constructed for the evaluation. We perform a comparative analysis by measuring the data size, cell overlap, and information loss of the EMR data cubes. Results : Global generalization considerably reduced the size of the EMR data cube and did not cause the data cube cells to overlap, but incurred a large amount of information loss. Local generalization maintained the data size and generated only moderate information loss, but there were cell overlaps that could decrease the search performance. Bucketization did not cause cells to overlap and generated little information loss; however, the method considerably inflated the size of the EMR data cubes. Conclusions : The utility of anonymized EMR data cubes varies widely according to the anonymization method, and the applicability of the anonymization method depends on the features of the EMR analysis environment. The findings help to adopt the optimal anonymization method considering the EMR analysis environment and goal of the EMR analysis.

KW - Anonymization

KW - Data cube

KW - Electronic medical records

KW - Medical privacy

UR - http://www.scopus.com/inward/record.url?scp=84989189536&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989189536&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2016.09.008

DO - 10.1016/j.ijmedinf.2016.09.008

M3 - Article

VL - 97

SP - 33

EP - 42

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

SN - 1386-5056

ER -