Multiple predicting K-fold cross-validation for model selection

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, (K – 1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses (K – 1) folds of the data for model validation, while the other fold is for model construction. This provides (K – 1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.

Original languageEnglish
Pages (from-to)197-215
Number of pages19
JournalJournal of Nonparametric Statistics
Volume30
Issue number1
DOIs
Publication statusPublished - 2018 Jan 2

Fingerprint

Cross-validation
Model Selection
Fold
Model Validation
Variable Selection
Model Selection Criteria
Model selection
Model
Averaging
High-dimensional
Imply
Model validation

Keywords

  • Cross-validation
  • K-fold cross-validation
  • model selection
  • tuning parameter selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Multiple predicting K-fold cross-validation for model selection. / Jung, Yoonsuh.

In: Journal of Nonparametric Statistics, Vol. 30, No. 1, 02.01.2018, p. 197-215.

Research output: Contribution to journalArticle

@article{24f1bb691acb4d9692453b241492e82d,
title = "Multiple predicting K-fold cross-validation for model selection",
abstract = "K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, (K – 1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses (K – 1) folds of the data for model validation, while the other fold is for model construction. This provides (K – 1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.",
keywords = "Cross-validation, K-fold cross-validation, model selection, tuning parameter selection",
author = "Yoonsuh Jung",
year = "2018",
month = "1",
day = "2",
doi = "10.1080/10485252.2017.1404598",
language = "English",
volume = "30",
pages = "197--215",
journal = "Journal of Nonparametric Statistics",
issn = "1048-5252",
publisher = "Taylor and Francis Ltd.",
number = "1",

}

TY - JOUR

T1 - Multiple predicting K-fold cross-validation for model selection

AU - Jung, Yoonsuh

PY - 2018/1/2

Y1 - 2018/1/2

N2 - K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, (K – 1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses (K – 1) folds of the data for model validation, while the other fold is for model construction. This provides (K – 1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.

AB - K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, (K – 1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses (K – 1) folds of the data for model validation, while the other fold is for model construction. This provides (K – 1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.

KW - Cross-validation

KW - K-fold cross-validation

KW - model selection

KW - tuning parameter selection

UR - http://www.scopus.com/inward/record.url?scp=85034647662&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034647662&partnerID=8YFLogxK

U2 - 10.1080/10485252.2017.1404598

DO - 10.1080/10485252.2017.1404598

M3 - Article

AN - SCOPUS:85034647662

VL - 30

SP - 197

EP - 215

JO - Journal of Nonparametric Statistics

JF - Journal of Nonparametric Statistics

SN - 1048-5252

IS - 1

ER -