Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search

Inamul Hasan Madar, Wonyeop Lee, Xiaojing Wang, Seung Ik Ko, Hokeun Kim, Dong Gi Mun, Bing Zhang, Eunok Paek, Sang-Won Lee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.

Original languageEnglish
JournalInternational Journal of Mass Spectrometry
DOIs
Publication statusAccepted/In press - 2017

Fingerprint

peptides
proteome
Peptides
genes
Genes
Proteome
coding
Proteins
data integration
Exons
sequencing
complement
Data integration
engines
Search engines
tumors
Tumors
proteins
Proteomics
Tissue

Keywords

  • mPE-MMR
  • Multi-stage database search
  • Mutations
  • Proteogenomics
  • PTMs
  • Unidentified spectra

ASJC Scopus subject areas

  • Instrumentation
  • Condensed Matter Physics
  • Spectroscopy
  • Physical and Theoretical Chemistry

Cite this

Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search. / Madar, Inamul Hasan; Lee, Wonyeop; Wang, Xiaojing; Ko, Seung Ik; Kim, Hokeun; Mun, Dong Gi; Zhang, Bing; Paek, Eunok; Lee, Sang-Won.

In: International Journal of Mass Spectrometry, 2017.

Research output: Contribution to journalArticle

Madar, Inamul Hasan ; Lee, Wonyeop ; Wang, Xiaojing ; Ko, Seung Ik ; Kim, Hokeun ; Mun, Dong Gi ; Zhang, Bing ; Paek, Eunok ; Lee, Sang-Won. / Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search. In: International Journal of Mass Spectrometry. 2017.
@article{47115fd1549443b886b482c4b2b85062,
title = "Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search",
abstract = "Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.",
keywords = "mPE-MMR, Multi-stage database search, Mutations, Proteogenomics, PTMs, Unidentified spectra",
author = "Madar, {Inamul Hasan} and Wonyeop Lee and Xiaojing Wang and Ko, {Seung Ik} and Hokeun Kim and Mun, {Dong Gi} and Bing Zhang and Eunok Paek and Sang-Won Lee",
year = "2017",
doi = "10.1016/j.ijms.2017.08.015",
language = "English",
journal = "International Journal of Mass Spectrometry",
issn = "1387-3806",
publisher = "Elsevier",

}

TY - JOUR

T1 - Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search

AU - Madar, Inamul Hasan

AU - Lee, Wonyeop

AU - Wang, Xiaojing

AU - Ko, Seung Ik

AU - Kim, Hokeun

AU - Mun, Dong Gi

AU - Zhang, Bing

AU - Paek, Eunok

AU - Lee, Sang-Won

PY - 2017

Y1 - 2017

N2 - Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.

AB - Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.

KW - mPE-MMR

KW - Multi-stage database search

KW - Mutations

KW - Proteogenomics

KW - PTMs

KW - Unidentified spectra

UR - http://www.scopus.com/inward/record.url?scp=85031825383&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85031825383&partnerID=8YFLogxK

U2 - 10.1016/j.ijms.2017.08.015

DO - 10.1016/j.ijms.2017.08.015

M3 - Article

JO - International Journal of Mass Spectrometry

JF - International Journal of Mass Spectrometry

SN - 1387-3806

ER -