Unsupervised feature selection using weighted principal components

Seoung Bum Kim, Panaya Rattakorn

Research output: Contribution to journalArticle

32 Citations (Scopus)

Abstract

Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.

Original languageEnglish
Pages (from-to)5704-5710
Number of pages7
JournalExpert Systems with Applications
Volume38
Issue number5
DOIs
Publication statusPublished - 2011 May 1

Fingerprint

Feature extraction
Principal component analysis
Statistical Models

Keywords

  • Data mining
  • Feature selection
  • Principal component analysis
  • Unsupervised learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Unsupervised feature selection using weighted principal components. / Kim, Seoung Bum; Rattakorn, Panaya.

In: Expert Systems with Applications, Vol. 38, No. 5, 01.05.2011, p. 5704-5710.

Research output: Contribution to journalArticle

@article{267cf154ae704c61a50a28a0d4783094,
title = "Unsupervised feature selection using weighted principal components",
abstract = "Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.",
keywords = "Data mining, Feature selection, Principal component analysis, Unsupervised learning",
author = "Kim, {Seoung Bum} and Panaya Rattakorn",
year = "2011",
month = "5",
day = "1",
doi = "10.1016/j.eswa.2010.10.063",
language = "English",
volume = "38",
pages = "5704--5710",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "5",

}

TY - JOUR

T1 - Unsupervised feature selection using weighted principal components

AU - Kim, Seoung Bum

AU - Rattakorn, Panaya

PY - 2011/5/1

Y1 - 2011/5/1

N2 - Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.

AB - Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.

KW - Data mining

KW - Feature selection

KW - Principal component analysis

KW - Unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=79151480230&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79151480230&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2010.10.063

DO - 10.1016/j.eswa.2010.10.063

M3 - Article

AN - SCOPUS:79151480230

VL - 38

SP - 5704

EP - 5710

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 5

ER -