A Comparison of the Effects of Data Imputation Methods on Model Performance

Wooyoung Kim, Wonwoong Cho, Jangho Choi, Jiyong Kim, Cheonbok Park, Jaegul Choo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.

Original languageEnglish
Title of host publication21st International Conference on Advanced Communication Technology
Subtitle of host publicationICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages592-599
Number of pages8
ISBN (Electronic)9791188428021
DOIs
Publication statusPublished - 2019 Apr 29
Event21st International Conference on Advanced Communication Technology, ICACT 2019 - Pyeongchang, Korea, Republic of
Duration: 2019 Feb 172019 Feb 20

Publication series

NameInternational Conference on Advanced Communication Technology, ICACT
Volume2019-February
ISSN (Print)1738-9445

Conference

Conference21st International Conference on Advanced Communication Technology, ICACT 2019
CountryKorea, Republic of
CityPyeongchang
Period19/2/1719/2/20

Fingerprint

Water quality
Lakes
Sensors
Experiments

Keywords

  • amelia imputation
  • imputation methods
  • incomplete data
  • knn imputation
  • linear interpolation
  • mean imputation
  • mice imputation
  • missing data
  • missing values
  • model performance
  • randomforest imputation
  • SVD imputation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Kim, W., Cho, W., Choi, J., Kim, J., Park, C., & Choo, J. (2019). A Comparison of the Effects of Data Imputation Methods on Model Performance. In 21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding (pp. 592-599). [8702000] (International Conference on Advanced Communication Technology, ICACT; Vol. 2019-February). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/ICACT.2019.8702000

A Comparison of the Effects of Data Imputation Methods on Model Performance. / Kim, Wooyoung; Cho, Wonwoong; Choi, Jangho; Kim, Jiyong; Park, Cheonbok; Choo, Jaegul.

21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding. Institute of Electrical and Electronics Engineers Inc., 2019. p. 592-599 8702000 (International Conference on Advanced Communication Technology, ICACT; Vol. 2019-February).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, W, Cho, W, Choi, J, Kim, J, Park, C & Choo, J 2019, A Comparison of the Effects of Data Imputation Methods on Model Performance. in 21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding., 8702000, International Conference on Advanced Communication Technology, ICACT, vol. 2019-February, Institute of Electrical and Electronics Engineers Inc., pp. 592-599, 21st International Conference on Advanced Communication Technology, ICACT 2019, Pyeongchang, Korea, Republic of, 19/2/17. https://doi.org/10.23919/ICACT.2019.8702000
Kim W, Cho W, Choi J, Kim J, Park C, Choo J. A Comparison of the Effects of Data Imputation Methods on Model Performance. In 21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding. Institute of Electrical and Electronics Engineers Inc. 2019. p. 592-599. 8702000. (International Conference on Advanced Communication Technology, ICACT). https://doi.org/10.23919/ICACT.2019.8702000
Kim, Wooyoung ; Cho, Wonwoong ; Choi, Jangho ; Kim, Jiyong ; Park, Cheonbok ; Choo, Jaegul. / A Comparison of the Effects of Data Imputation Methods on Model Performance. 21st International Conference on Advanced Communication Technology: ICT for 4th Industrial Revolution!, ICACT 2019 - Proceeding. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 592-599 (International Conference on Advanced Communication Technology, ICACT).
@inproceedings{344bdadc40114428b4e5842bc911dc00,
title = "A Comparison of the Effects of Data Imputation Methods on Model Performance",
abstract = "Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.",
keywords = "amelia imputation, imputation methods, incomplete data, knn imputation, linear interpolation, mean imputation, mice imputation, missing data, missing values, model performance, randomforest imputation, SVD imputation",
author = "Wooyoung Kim and Wonwoong Cho and Jangho Choi and Jiyong Kim and Cheonbok Park and Jaegul Choo",
year = "2019",
month = "4",
day = "29",
doi = "10.23919/ICACT.2019.8702000",
language = "English",
series = "International Conference on Advanced Communication Technology, ICACT",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "592--599",
booktitle = "21st International Conference on Advanced Communication Technology",

}

TY - GEN

T1 - A Comparison of the Effects of Data Imputation Methods on Model Performance

AU - Kim, Wooyoung

AU - Cho, Wonwoong

AU - Choi, Jangho

AU - Kim, Jiyong

AU - Park, Cheonbok

AU - Choo, Jaegul

PY - 2019/4/29

Y1 - 2019/4/29

N2 - Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.

AB - Missing values cause critical problems on training a prediction model. Various missing data imputation methods have been introduced to settle down the problem. However, the imputation accuracy obtained by the methods is insufficient to validate performance of prediction models. Thus, in this study, we compare (1) imputation accuracy from various imputation methods as well as (2) the effects of imputation methods on prediction accuracy, investigating a relationship between imputation accuracy and prediction accuracy. For the comparison, we use water quality data composed of the latest actual observational multi-sensor data from Daecheong Lake. We conduct several experiments to compare seven imputation methods including a state of the art method, and their effects on three distinct prediction models. Through quantitative comparison and analysis, we proved that it is necessary to consider both imputation accuracy and model prediction accuracy when choosing an imputation method.

KW - amelia imputation

KW - imputation methods

KW - incomplete data

KW - knn imputation

KW - linear interpolation

KW - mean imputation

KW - mice imputation

KW - missing data

KW - missing values

KW - model performance

KW - randomforest imputation

KW - SVD imputation

UR - http://www.scopus.com/inward/record.url?scp=85065674700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065674700&partnerID=8YFLogxK

U2 - 10.23919/ICACT.2019.8702000

DO - 10.23919/ICACT.2019.8702000

M3 - Conference contribution

AN - SCOPUS:85065674700

T3 - International Conference on Advanced Communication Technology, ICACT

SP - 592

EP - 599

BT - 21st International Conference on Advanced Communication Technology

PB - Institute of Electrical and Electronics Engineers Inc.

ER -