Simple decision rules for classifying human cancers from gene expression profiles

Aik-Choon Tan, Daniel Q. Naiman, Lei Xu, Raimond L. Winslow, Donald Geman

Research output: Contribution to journalArticle

253 Citations (Scopus)

Abstract

Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.

Original languageEnglish
Pages (from-to)3896-3904
Number of pages9
JournalBioinformatics
Volume21
Issue number20
DOIs
Publication statusPublished - 2005 Oct 1
Externally publishedYes

Fingerprint

Gene Expression Profile
Neoplasm Genes
Decision Rules
Transcriptome
Gene expression
Cancer
Classifiers
Classifier
Gene Expression
Scoring
Microarrays
Gene Expression Data
Learning systems
Machine Learning
Genes
Cancer Classification
Gene
Neoplasms
Decision Trees
Prediction

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Simple decision rules for classifying human cancers from gene expression profiles. / Tan, Aik-Choon; Naiman, Daniel Q.; Xu, Lei; Winslow, Raimond L.; Geman, Donald.

In: Bioinformatics, Vol. 21, No. 20, 01.10.2005, p. 3896-3904.

Research output: Contribution to journalArticle

Tan, A-C, Naiman, DQ, Xu, L, Winslow, RL & Geman, D 2005, 'Simple decision rules for classifying human cancers from gene expression profiles', Bioinformatics, vol. 21, no. 20, pp. 3896-3904. https://doi.org/10.1093/bioinformatics/bti631
Tan, Aik-Choon ; Naiman, Daniel Q. ; Xu, Lei ; Winslow, Raimond L. ; Geman, Donald. / Simple decision rules for classifying human cancers from gene expression profiles. In: Bioinformatics. 2005 ; Vol. 21, No. 20. pp. 3896-3904.
@article{efa12f5220b94952b2572d464ddb2775,
title = "Simple decision rules for classifying human cancers from gene expression profiles",
abstract = "Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and na{\"i}ve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.",
author = "Aik-Choon Tan and Naiman, {Daniel Q.} and Lei Xu and Winslow, {Raimond L.} and Donald Geman",
year = "2005",
month = "10",
day = "1",
doi = "10.1093/bioinformatics/bti631",
language = "English",
volume = "21",
pages = "3896--3904",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "20",

}

TY - JOUR

T1 - Simple decision rules for classifying human cancers from gene expression profiles

AU - Tan, Aik-Choon

AU - Naiman, Daniel Q.

AU - Xu, Lei

AU - Winslow, Raimond L.

AU - Geman, Donald

PY - 2005/10/1

Y1 - 2005/10/1

N2 - Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.

AB - Motivation: Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. Results: In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data.

UR - http://www.scopus.com/inward/record.url?scp=27544451127&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544451127&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bti631

DO - 10.1093/bioinformatics/bti631

M3 - Article

VL - 21

SP - 3896

EP - 3904

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 20

ER -