TY - JOUR
T1 - Unsupervised feature selection using weighted principal components
AU - Kim, Seoung Bum
AU - Rattakorn, Panaya
N1 - Funding Information:
This work was support in part by Grant No. 2010003811 from the National Research Foundation of Korea.
Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011/5
Y1 - 2011/5
N2 - Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.
AB - Feature selection has received considerable attention in various areas as a way to select informative features and to simplify the statistical model through dimensional reduction. One of the most widely used methods for dimensional reduction includes principal component analysis (PCA). Despite its popularity, PCA suffers from a lack of interpretability of the original feature because the reduced dimensions are linear combinations of a large number of original features. Traditionally, two or three dimensional loading plots provide information to identify important original features in the first few principal component dimensions. However, the interpretation of what constitutes a loading plot is frequently subjective, particularly when large numbers of features are involved. In this study, we propose an unsupervised feature selection method that combines weighted principal components (PCs) with a thresholding algorithm. The weighted PC is obtained by the weighted sum of the first k PCs of interest. Each of the k loading values in the weighted PC reflects the contribution of each individual feature. We also propose a thresholding algorithm that identifies the significant features. Our experimental results with both the simulated and real datasets demonstrated the effectiveness of the proposed unsupervised feature selection method.
KW - Data mining
KW - Feature selection
KW - Principal component analysis
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=79151480230&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79151480230&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2010.10.063
DO - 10.1016/j.eswa.2010.10.063
M3 - Article
AN - SCOPUS:79151480230
VL - 38
SP - 5704
EP - 5710
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
IS - 5
ER -