Controlling the false discovery rate for feature selection in high-resolution NMR spectra

Seoung Bum Kim, Victoria C P Chen, Youngja Park, Thomas R. Ziegler, Dean P. Jones

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features.

Original languageEnglish
Pages (from-to)57-66
Number of pages10
JournalStatistical Analysis and Data Mining
Volume1
Issue number2
DOIs
Publication statusPublished - 2008 Jun 1
Externally publishedYes

Fingerprint

Nuclear Magnetic Resonance
Feature Selection
Feature extraction
High Resolution
Nuclear magnetic resonance
Partial Least Squares
Principal component analysis
Principal Component Analysis
Metabolites
Orthogonal Signal Correction
Misclassification Rate
Multiple Testing
Process Modeling
Linear Combination
Simplify
Visualization
False
Entire
Necessary
Testing

Keywords

  • False discovery rate
  • Feature selection
  • Metabolomics
  • Nuclear magnetic resonance
  • Orthogonal signal correction

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Analysis

Cite this

Controlling the false discovery rate for feature selection in high-resolution NMR spectra. / Kim, Seoung Bum; Chen, Victoria C P; Park, Youngja; Ziegler, Thomas R.; Jones, Dean P.

In: Statistical Analysis and Data Mining, Vol. 1, No. 2, 01.06.2008, p. 57-66.

Research output: Contribution to journalArticle

Kim, Seoung Bum ; Chen, Victoria C P ; Park, Youngja ; Ziegler, Thomas R. ; Jones, Dean P. / Controlling the false discovery rate for feature selection in high-resolution NMR spectra. In: Statistical Analysis and Data Mining. 2008 ; Vol. 1, No. 2. pp. 57-66.
@article{ba42f3bf85ff48db8bc41738b592f253,
title = "Controlling the false discovery rate for feature selection in high-resolution NMR spectra",
abstract = "Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features.",
keywords = "False discovery rate, Feature selection, Metabolomics, Nuclear magnetic resonance, Orthogonal signal correction",
author = "Kim, {Seoung Bum} and Chen, {Victoria C P} and Youngja Park and Ziegler, {Thomas R.} and Jones, {Dean P.}",
year = "2008",
month = "6",
day = "1",
doi = "10.1002/sam.10005",
language = "English",
volume = "1",
pages = "57--66",
journal = "Statistical Analysis and Data Mining",
issn = "1932-1872",
publisher = "John Wiley and Sons Inc.",
number = "2",

}

TY - JOUR

T1 - Controlling the false discovery rate for feature selection in high-resolution NMR spectra

AU - Kim, Seoung Bum

AU - Chen, Victoria C P

AU - Park, Youngja

AU - Ziegler, Thomas R.

AU - Jones, Dean P.

PY - 2008/6/1

Y1 - 2008/6/1

N2 - Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features.

AB - Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features.

KW - False discovery rate

KW - Feature selection

KW - Metabolomics

KW - Nuclear magnetic resonance

KW - Orthogonal signal correction

UR - http://www.scopus.com/inward/record.url?scp=76649092281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76649092281&partnerID=8YFLogxK

U2 - 10.1002/sam.10005

DO - 10.1002/sam.10005

M3 - Article

VL - 1

SP - 57

EP - 66

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

SN - 1932-1872

IS - 2

ER -