Sequential random k-nearest neighbor feature selection for high-dimensional data

Chan Hee Park, Seoung Bum Kim

Research output: Contribution to journalArticle

37 Citations (Scopus)

Abstract

Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.

Original languageEnglish
Pages (from-to)2336-2342
Number of pages7
JournalExpert Systems with Applications
Volume42
Issue number5
DOIs
Publication statusPublished - 2015 Apr 1

Fingerprint

Feature extraction
Classifiers
Decision trees
Microarrays
Gene expression
Experiments

Keywords

  • Ensemble Wrapper
  • Feature selection
  • High dimensionality
  • k-NN
  • Random forest

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Sequential random k-nearest neighbor feature selection for high-dimensional data. / Park, Chan Hee; Kim, Seoung Bum.

In: Expert Systems with Applications, Vol. 42, No. 5, 01.04.2015, p. 2336-2342.

Research output: Contribution to journalArticle

@article{74b6154481e44a959915b1c1d362d08a,
title = "Sequential random k-nearest neighbor feature selection for high-dimensional data",
abstract = "Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.",
keywords = "Ensemble Wrapper, Feature selection, High dimensionality, k-NN, Random forest",
author = "Park, {Chan Hee} and Kim, {Seoung Bum}",
year = "2015",
month = "4",
day = "1",
doi = "10.1016/j.eswa.2014.10.044",
language = "English",
volume = "42",
pages = "2336--2342",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "5",

}

TY - JOUR

T1 - Sequential random k-nearest neighbor feature selection for high-dimensional data

AU - Park, Chan Hee

AU - Kim, Seoung Bum

PY - 2015/4/1

Y1 - 2015/4/1

N2 - Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.

AB - Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.

KW - Ensemble Wrapper

KW - Feature selection

KW - High dimensionality

KW - k-NN

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=84912544144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84912544144&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2014.10.044

DO - 10.1016/j.eswa.2014.10.044

M3 - Article

VL - 42

SP - 2336

EP - 2342

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 5

ER -