Locally linear reconstruction based missing value imputation for supervised learning

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

Most learning algorithms generally assume that data is complete so each attribute of all instances is filled with a valid value. However, missing values are very common in real datasets for various reasons. In this paper, we propose a new single imputation method based on locally linear reconstruction (LLR) that improves the prediction performance of supervised learning (classification & regression) with missing values. First, we investigate how missing values degrade the prediction performance with various missing ratios. Next, we compare the proposed missing value imputation method (LLR) with six well-known single imputation methods for five different learning algorithms based on 13 classification and nine regression datasets. The experimental results showed that (1) all imputation methods helped to improve the prediction accuracy, although some were very simple; (2) the proposed LLR imputation method enhanced the modeling performance more than all other imputation methods, irrespective of the learning algorithms and the missing ratios; and (3) LLR was outstanding when the missing ratio was relatively high and its prediction accuracy was similar to that of the complete dataset.

Original languageEnglish
Pages (from-to)65-78
Number of pages14
JournalNeurocomputing
Volume118
DOIs
Publication statusPublished - 2013 Oct 22
Externally publishedYes

Fingerprint

Supervised learning
Learning
Learning algorithms
Datasets

Keywords

  • Classification
  • Locally linear reconstruction (LLR)
  • Missing value imputation
  • Regression
  • Supervised learning

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

Locally linear reconstruction based missing value imputation for supervised learning. / Kang, Pilsung.

In: Neurocomputing, Vol. 118, 22.10.2013, p. 65-78.

Research output: Contribution to journalArticle

@article{d0cd24ba4f7c484097356b21aa9f213a,
title = "Locally linear reconstruction based missing value imputation for supervised learning",
abstract = "Most learning algorithms generally assume that data is complete so each attribute of all instances is filled with a valid value. However, missing values are very common in real datasets for various reasons. In this paper, we propose a new single imputation method based on locally linear reconstruction (LLR) that improves the prediction performance of supervised learning (classification & regression) with missing values. First, we investigate how missing values degrade the prediction performance with various missing ratios. Next, we compare the proposed missing value imputation method (LLR) with six well-known single imputation methods for five different learning algorithms based on 13 classification and nine regression datasets. The experimental results showed that (1) all imputation methods helped to improve the prediction accuracy, although some were very simple; (2) the proposed LLR imputation method enhanced the modeling performance more than all other imputation methods, irrespective of the learning algorithms and the missing ratios; and (3) LLR was outstanding when the missing ratio was relatively high and its prediction accuracy was similar to that of the complete dataset.",
keywords = "Classification, Locally linear reconstruction (LLR), Missing value imputation, Regression, Supervised learning",
author = "Pilsung Kang",
year = "2013",
month = "10",
day = "22",
doi = "10.1016/j.neucom.2013.02.016",
language = "English",
volume = "118",
pages = "65--78",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

TY - JOUR

T1 - Locally linear reconstruction based missing value imputation for supervised learning

AU - Kang, Pilsung

PY - 2013/10/22

Y1 - 2013/10/22

N2 - Most learning algorithms generally assume that data is complete so each attribute of all instances is filled with a valid value. However, missing values are very common in real datasets for various reasons. In this paper, we propose a new single imputation method based on locally linear reconstruction (LLR) that improves the prediction performance of supervised learning (classification & regression) with missing values. First, we investigate how missing values degrade the prediction performance with various missing ratios. Next, we compare the proposed missing value imputation method (LLR) with six well-known single imputation methods for five different learning algorithms based on 13 classification and nine regression datasets. The experimental results showed that (1) all imputation methods helped to improve the prediction accuracy, although some were very simple; (2) the proposed LLR imputation method enhanced the modeling performance more than all other imputation methods, irrespective of the learning algorithms and the missing ratios; and (3) LLR was outstanding when the missing ratio was relatively high and its prediction accuracy was similar to that of the complete dataset.

AB - Most learning algorithms generally assume that data is complete so each attribute of all instances is filled with a valid value. However, missing values are very common in real datasets for various reasons. In this paper, we propose a new single imputation method based on locally linear reconstruction (LLR) that improves the prediction performance of supervised learning (classification & regression) with missing values. First, we investigate how missing values degrade the prediction performance with various missing ratios. Next, we compare the proposed missing value imputation method (LLR) with six well-known single imputation methods for five different learning algorithms based on 13 classification and nine regression datasets. The experimental results showed that (1) all imputation methods helped to improve the prediction accuracy, although some were very simple; (2) the proposed LLR imputation method enhanced the modeling performance more than all other imputation methods, irrespective of the learning algorithms and the missing ratios; and (3) LLR was outstanding when the missing ratio was relatively high and its prediction accuracy was similar to that of the complete dataset.

KW - Classification

KW - Locally linear reconstruction (LLR)

KW - Missing value imputation

KW - Regression

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84881242648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881242648&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2013.02.016

DO - 10.1016/j.neucom.2013.02.016

M3 - Article

AN - SCOPUS:84881242648

VL - 118

SP - 65

EP - 78

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -