An ensemble regularization method for feature selection in mass spectral fingerprints

Younghoon Kim, Kevin A. Schug, Seoung Bum Kim

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.

Original languageEnglish
Pages (from-to)322-328
Number of pages7
JournalChemometrics and Intelligent Laboratory Systems
Volume146
DOIs
Publication statusPublished - 2015 Aug 5

Fingerprint

Feature extraction
Spectrum analysis
Mass spectrometry
Statistics

Keywords

  • Bootstrap
  • Cuticular hydrocarbons
  • Ensemble
  • Feature selection
  • Lipid mass spectra
  • Regularization

ASJC Scopus subject areas

  • Analytical Chemistry
  • Computer Science Applications
  • Software
  • Process Chemistry and Technology
  • Spectroscopy

Cite this

An ensemble regularization method for feature selection in mass spectral fingerprints. / Kim, Younghoon; Schug, Kevin A.; Kim, Seoung Bum.

In: Chemometrics and Intelligent Laboratory Systems, Vol. 146, 05.08.2015, p. 322-328.

Research output: Contribution to journalArticle

@article{755616da478044788a8e929de16309cc,
title = "An ensemble regularization method for feature selection in mass spectral fingerprints",
abstract = "Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.",
keywords = "Bootstrap, Cuticular hydrocarbons, Ensemble, Feature selection, Lipid mass spectra, Regularization",
author = "Younghoon Kim and Schug, {Kevin A.} and Kim, {Seoung Bum}",
year = "2015",
month = "8",
day = "5",
doi = "10.1016/j.chemolab.2015.05.009",
language = "English",
volume = "146",
pages = "322--328",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

TY - JOUR

T1 - An ensemble regularization method for feature selection in mass spectral fingerprints

AU - Kim, Younghoon

AU - Schug, Kevin A.

AU - Kim, Seoung Bum

PY - 2015/8/5

Y1 - 2015/8/5

N2 - Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.

AB - Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.

KW - Bootstrap

KW - Cuticular hydrocarbons

KW - Ensemble

KW - Feature selection

KW - Lipid mass spectra

KW - Regularization

UR - http://www.scopus.com/inward/record.url?scp=84934898585&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84934898585&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2015.05.009

DO - 10.1016/j.chemolab.2015.05.009

M3 - Article

VL - 146

SP - 322

EP - 328

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

ER -