A missing variable imputation methodology with an empirical application

Gayaneh Kyureghian, Oral Capps, Rodolfo M. Nayga, Jr

Research output: Chapter in Book/Report/Conference proceedingChapter

9 Citations (Scopus)

Abstract

The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of "design" that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

Original languageEnglish
Title of host publicationAdvances in Econometrics
Pages313-337
Number of pages25
Volume27 A
DOIs
Publication statusPublished - 2011 Dec 1

Publication series

NameAdvances in Econometrics
Volume27 A
ISSN (Print)07319053

Fingerprint

Methodology
Imputation
Multiple imputation
Prediction
Conditional distribution
Standard deviation
Regression analysis
Missing values

Keywords

  • Missingness
  • Multiple imputation
  • Nielsen homescan data
  • Nonresponse
  • Single imputation

ASJC Scopus subject areas

  • Economics and Econometrics

Cite this

Kyureghian, G., Capps, O., & Nayga, Jr, R. M. (2011). A missing variable imputation methodology with an empirical application. In Advances in Econometrics (Vol. 27 A, pp. 313-337). [17004573] (Advances in Econometrics; Vol. 27 A). https://doi.org/10.1108/S0731-9053(2011)000027A015

A missing variable imputation methodology with an empirical application. / Kyureghian, Gayaneh; Capps, Oral; Nayga, Jr, Rodolfo M.

Advances in Econometrics. Vol. 27 A 2011. p. 313-337 17004573 (Advances in Econometrics; Vol. 27 A).

Research output: Chapter in Book/Report/Conference proceedingChapter

Kyureghian, G, Capps, O & Nayga, Jr, RM 2011, A missing variable imputation methodology with an empirical application. in Advances in Econometrics. vol. 27 A, 17004573, Advances in Econometrics, vol. 27 A, pp. 313-337. https://doi.org/10.1108/S0731-9053(2011)000027A015
Kyureghian G, Capps O, Nayga, Jr RM. A missing variable imputation methodology with an empirical application. In Advances in Econometrics. Vol. 27 A. 2011. p. 313-337. 17004573. (Advances in Econometrics). https://doi.org/10.1108/S0731-9053(2011)000027A015
Kyureghian, Gayaneh ; Capps, Oral ; Nayga, Jr, Rodolfo M. / A missing variable imputation methodology with an empirical application. Advances in Econometrics. Vol. 27 A 2011. pp. 313-337 (Advances in Econometrics).
@inbook{7a44db3349784af295aa8d021a38a2f9,
title = "A missing variable imputation methodology with an empirical application",
abstract = "The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of {"}design{"} that typically come with simulated data, and its observational nature. We created random 20{\%} and 50{\%} uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20{\%} and 50{\%} missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.",
keywords = "Missingness, Multiple imputation, Nielsen homescan data, Nonresponse, Single imputation",
author = "Gayaneh Kyureghian and Oral Capps and {Nayga, Jr}, {Rodolfo M.}",
year = "2011",
month = "12",
day = "1",
doi = "10.1108/S0731-9053(2011)000027A015",
language = "English",
isbn = "9781780525242",
volume = "27 A",
series = "Advances in Econometrics",
pages = "313--337",
booktitle = "Advances in Econometrics",

}

TY - CHAP

T1 - A missing variable imputation methodology with an empirical application

AU - Kyureghian, Gayaneh

AU - Capps, Oral

AU - Nayga, Jr, Rodolfo M.

PY - 2011/12/1

Y1 - 2011/12/1

N2 - The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of "design" that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

AB - The objective of this research is to examine, validate, and recommend techniques for handling the problem of missingness in observational data. We use a rich observational data set, the Nielsen HomeScan data set, which allows us to effectively combine elements from simulated data sets: large numbers of observations, large number of data sets and variables, allowing elements of "design" that typically come with simulated data, and its observational nature. We created random 20% and 50% uniform missingness in our data sets and employed several widely used methods of single imputation, such as mean, regression, and stochastic regression imputations, and multiple imputation methods to fill in the data gaps. We compared these methods by measuring the error of predicting the missing values and the parameter estimates from the subsequent regression analysis using the imputed values. We also compared coverage or the percentages of intervals that covered the true parameter in both cases. Based on our results, the method of single regression or conditional mean imputation provided the best predictions of the missing price values with 28.34 and 28.59 mean absolute percent errors in 20% and 50% missingness settings, respectively. The imputation from conditional distribution method had the best rate of coverage. The parameter estimates based on data sets imputed by conditional mean method were consistently unbiased and had the smallest standard deviations. The multiple imputation methods had the best coverage of both the parameter estimates and predictions of the dependent variable.

KW - Missingness

KW - Multiple imputation

KW - Nielsen homescan data

KW - Nonresponse

KW - Single imputation

UR - http://www.scopus.com/inward/record.url?scp=84874142128&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874142128&partnerID=8YFLogxK

U2 - 10.1108/S0731-9053(2011)000027A015

DO - 10.1108/S0731-9053(2011)000027A015

M3 - Chapter

AN - SCOPUS:84874142128

SN - 9781780525242

VL - 27 A

T3 - Advances in Econometrics

SP - 313

EP - 337

BT - Advances in Econometrics

ER -