TY - JOUR
T1 - A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer's Disease Diagnosis
AU - An, Le
AU - Adeli, Ehsan
AU - Liu, Mingxia
AU - Zhang, Jun
AU - Lee, Seong Whan
AU - Shen, Dinggang
N1 - Funding Information:
This work was supported in part by NIH grants (EB006733, EB008374, MH100217, MH108914, AG041721, AG049371, AG042599, AG053867, EB022880, MH110274).
PY - 2017/3/30
Y1 - 2017/3/30
N2 - Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer's disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.
AB - Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer's disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.
UR - http://www.scopus.com/inward/record.url?scp=85016748973&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016748973&partnerID=8YFLogxK
U2 - 10.1038/srep45269
DO - 10.1038/srep45269
M3 - Article
C2 - 28358032
AN - SCOPUS:85016748973
VL - 7
JO - Scientific Reports
JF - Scientific Reports
SN - 2045-2322
M1 - 45269
ER -