A Comparison of the Effects of Data Imputation Methods on Model Performance

Wooyoung Kim, Wonwoong Cho, Jangho Choi, Jiyong Kim, Cheonbok Park, Jaegul Choo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.

Original languageEnglish
Title of host publication21st International Conference on Advanced Communication Technology
Subtitle of host publicationICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages592-599
Number of pages8
ISBN (Electronic)9791188428021
DOIs
Publication statusPublished - 2019 Apr 29
Event21st International Conference on Advanced Communication Technology, ICACT 2019 - Pyeongchang, Korea, Republic of
Duration: 2019 Feb 172019 Feb 20

Publication series

NameInternational Conference on Advanced Communication Technology, ICACT
Volume2019-February
ISSN (Print)1738-9445

Conference

Conference21st International Conference on Advanced Communication Technology, ICACT 2019
CountryKorea, Republic of
CityPyeongchang
Period19/2/1719/2/20

Keywords

  • amelia imputation
  • imputation methods
  • incomplete data
  • knn imputation
  • linear interpolation
  • mean imputation
  • mice imputation
  • missing data
  • missing values
  • model performance
  • randomforest imputation
  • SVD imputation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'A Comparison of the Effects of Data Imputation Methods on Model Performance'. Together they form a unique fingerprint.

  • Cite this

    Kim, W., Cho, W., Choi, J., Kim, J., Park, C., & Choo, J. (2019). A Comparison of the Effects of Data Imputation Methods on Model Performance. In 21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding (pp. 592-599). [8702000] (International Conference on Advanced Communication Technology, ICACT; Vol. 2019-February). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/ICACT.2019.8702000