A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data

Junhee Seok, Ronald W. Davis, Wenzhong Xiao

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

Original languageEnglish
Article numbere0122103
JournalPloS one
Volume10
Issue number5
DOIs
Publication statusPublished - 2015 May 1

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Fingerprint Dive into the research topics of 'A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data'. Together they form a unique fingerprint.

Cite this