Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors

Weiping Ma, Lin S. Chen, Umut Özbek, Sung Won Han, Chenwei Lin, Amanda G. Paulovich, Hua Zhong, Pei Wang

Research output: Contribution to journalArticle

Abstract

Recent development in high throughput proteomics and genomics profiling enable one to study regulations of genome alterations on protein activities in a systematic manner. In this article, we propose a new statistical method, ProMAP, to systematically characterize the regulatory relationships between proteins and DNA copy number alterations (CNA) in breast and ovarian tumors based on proteogenomic data from the CPTAC-TCGA studies. Because of the dynamic nature of mass spectrometry instruments, proteomics data from labeled mass spectrometry experiments usually have non-ignorable batch effects. Moreover, mass spectrometry based proteomic data often possesses high percentages of missing values and non-ignorable missing-data patterns. Thus, we use a linear mixed effects model to account for the batch structure and explicitly incorporate the abundance-dependent-missing-data mechanism of proteomic data in ProMAP. In addition, we employ a multivariate regression framework to characterize the multiple-to-multiple regulatory relationships between CNA and proteins. Further, we use proper statistical regularization to facilitate the detection of master genetic regulators, which affect the activities of many proteins and often play important roles in genetic regulatory networks. Improved performance of ProMAP over existing methods were illustrated through extensive simulation studies and real data examples. Applying ProMAP to the CPTAC-TCGA breast and ovarian cancer data sets, we identified many genome regions, including a few novel ones, whose CNA were associated with protein and or phosphoprotein abundances. For example, in breast tumors, a small region in 8p11.21 was recognized as the second biggest hub in the CNA-phosphoprotein regulatory map, and further investigation of the regulatory targets suggests the potential role of 8p11.21 CNA in perturbing oxygen binding and transport activities in tumor cells. This and other findings from our analyses help to characterize the impacts of CNAs on protein activity landscapes and cast light on the genetic regulation mechanisms underlying these tumors.

Original languageEnglish
Pages (from-to)S66-S81
JournalMolecular & cellular proteomics : MCP
Volume18
Issue number8
DOIs
Publication statusPublished - 2019 Aug 9

Fingerprint

Tumors
Breast Neoplasms
Proteomics
Mass spectrometry
Mass Spectrometry
Proteins
Phosphoproteins
Genes
Genome
Genomics
Ovarian Neoplasms
Neoplasms
Statistical methods
Cells
Throughput
Oxygen
DNA
Experiments

Keywords

  • breast cancer
  • cis-regulation
  • CNA-protein/phosphosite regulatory map
  • CNAI
  • mass spectrometry
  • ovarian cancer
  • penalized mixed effect model
  • phosphoproteome
  • Proteogenomics
  • statistics
  • trans protein/phosphosite hubs

ASJC Scopus subject areas

  • Analytical Chemistry
  • Biochemistry
  • Molecular Biology

Cite this

Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors. / Ma, Weiping; Chen, Lin S.; Özbek, Umut; Han, Sung Won; Lin, Chenwei; Paulovich, Amanda G.; Zhong, Hua; Wang, Pei.

In: Molecular & cellular proteomics : MCP, Vol. 18, No. 8, 09.08.2019, p. S66-S81.

Research output: Contribution to journalArticle

Ma, Weiping ; Chen, Lin S. ; Özbek, Umut ; Han, Sung Won ; Lin, Chenwei ; Paulovich, Amanda G. ; Zhong, Hua ; Wang, Pei. / Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors. In: Molecular & cellular proteomics : MCP. 2019 ; Vol. 18, No. 8. pp. S66-S81.
@article{c75bab1bf666422fa728770b8e92e043,
title = "Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors",
abstract = "Recent development in high throughput proteomics and genomics profiling enable one to study regulations of genome alterations on protein activities in a systematic manner. In this article, we propose a new statistical method, ProMAP, to systematically characterize the regulatory relationships between proteins and DNA copy number alterations (CNA) in breast and ovarian tumors based on proteogenomic data from the CPTAC-TCGA studies. Because of the dynamic nature of mass spectrometry instruments, proteomics data from labeled mass spectrometry experiments usually have non-ignorable batch effects. Moreover, mass spectrometry based proteomic data often possesses high percentages of missing values and non-ignorable missing-data patterns. Thus, we use a linear mixed effects model to account for the batch structure and explicitly incorporate the abundance-dependent-missing-data mechanism of proteomic data in ProMAP. In addition, we employ a multivariate regression framework to characterize the multiple-to-multiple regulatory relationships between CNA and proteins. Further, we use proper statistical regularization to facilitate the detection of master genetic regulators, which affect the activities of many proteins and often play important roles in genetic regulatory networks. Improved performance of ProMAP over existing methods were illustrated through extensive simulation studies and real data examples. Applying ProMAP to the CPTAC-TCGA breast and ovarian cancer data sets, we identified many genome regions, including a few novel ones, whose CNA were associated with protein and or phosphoprotein abundances. For example, in breast tumors, a small region in 8p11.21 was recognized as the second biggest hub in the CNA-phosphoprotein regulatory map, and further investigation of the regulatory targets suggests the potential role of 8p11.21 CNA in perturbing oxygen binding and transport activities in tumor cells. This and other findings from our analyses help to characterize the impacts of CNAs on protein activity landscapes and cast light on the genetic regulation mechanisms underlying these tumors.",
keywords = "breast cancer, cis-regulation, CNA-protein/phosphosite regulatory map, CNAI, mass spectrometry, ovarian cancer, penalized mixed effect model, phosphoproteome, Proteogenomics, statistics, trans protein/phosphosite hubs",
author = "Weiping Ma and Chen, {Lin S.} and Umut {\"O}zbek and Han, {Sung Won} and Chenwei Lin and Paulovich, {Amanda G.} and Hua Zhong and Pei Wang",
year = "2019",
month = "8",
day = "9",
doi = "10.1074/mcp.RA118.001229",
language = "English",
volume = "18",
pages = "S66--S81",
journal = "Molecular and Cellular Proteomics",
issn = "1535-9476",
publisher = "American Society for Biochemistry and Molecular Biology Inc.",
number = "8",

}

TY - JOUR

T1 - Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors

AU - Ma, Weiping

AU - Chen, Lin S.

AU - Özbek, Umut

AU - Han, Sung Won

AU - Lin, Chenwei

AU - Paulovich, Amanda G.

AU - Zhong, Hua

AU - Wang, Pei

PY - 2019/8/9

Y1 - 2019/8/9

N2 - Recent development in high throughput proteomics and genomics profiling enable one to study regulations of genome alterations on protein activities in a systematic manner. In this article, we propose a new statistical method, ProMAP, to systematically characterize the regulatory relationships between proteins and DNA copy number alterations (CNA) in breast and ovarian tumors based on proteogenomic data from the CPTAC-TCGA studies. Because of the dynamic nature of mass spectrometry instruments, proteomics data from labeled mass spectrometry experiments usually have non-ignorable batch effects. Moreover, mass spectrometry based proteomic data often possesses high percentages of missing values and non-ignorable missing-data patterns. Thus, we use a linear mixed effects model to account for the batch structure and explicitly incorporate the abundance-dependent-missing-data mechanism of proteomic data in ProMAP. In addition, we employ a multivariate regression framework to characterize the multiple-to-multiple regulatory relationships between CNA and proteins. Further, we use proper statistical regularization to facilitate the detection of master genetic regulators, which affect the activities of many proteins and often play important roles in genetic regulatory networks. Improved performance of ProMAP over existing methods were illustrated through extensive simulation studies and real data examples. Applying ProMAP to the CPTAC-TCGA breast and ovarian cancer data sets, we identified many genome regions, including a few novel ones, whose CNA were associated with protein and or phosphoprotein abundances. For example, in breast tumors, a small region in 8p11.21 was recognized as the second biggest hub in the CNA-phosphoprotein regulatory map, and further investigation of the regulatory targets suggests the potential role of 8p11.21 CNA in perturbing oxygen binding and transport activities in tumor cells. This and other findings from our analyses help to characterize the impacts of CNAs on protein activity landscapes and cast light on the genetic regulation mechanisms underlying these tumors.

AB - Recent development in high throughput proteomics and genomics profiling enable one to study regulations of genome alterations on protein activities in a systematic manner. In this article, we propose a new statistical method, ProMAP, to systematically characterize the regulatory relationships between proteins and DNA copy number alterations (CNA) in breast and ovarian tumors based on proteogenomic data from the CPTAC-TCGA studies. Because of the dynamic nature of mass spectrometry instruments, proteomics data from labeled mass spectrometry experiments usually have non-ignorable batch effects. Moreover, mass spectrometry based proteomic data often possesses high percentages of missing values and non-ignorable missing-data patterns. Thus, we use a linear mixed effects model to account for the batch structure and explicitly incorporate the abundance-dependent-missing-data mechanism of proteomic data in ProMAP. In addition, we employ a multivariate regression framework to characterize the multiple-to-multiple regulatory relationships between CNA and proteins. Further, we use proper statistical regularization to facilitate the detection of master genetic regulators, which affect the activities of many proteins and often play important roles in genetic regulatory networks. Improved performance of ProMAP over existing methods were illustrated through extensive simulation studies and real data examples. Applying ProMAP to the CPTAC-TCGA breast and ovarian cancer data sets, we identified many genome regions, including a few novel ones, whose CNA were associated with protein and or phosphoprotein abundances. For example, in breast tumors, a small region in 8p11.21 was recognized as the second biggest hub in the CNA-phosphoprotein regulatory map, and further investigation of the regulatory targets suggests the potential role of 8p11.21 CNA in perturbing oxygen binding and transport activities in tumor cells. This and other findings from our analyses help to characterize the impacts of CNAs on protein activity landscapes and cast light on the genetic regulation mechanisms underlying these tumors.

KW - breast cancer

KW - cis-regulation

KW - CNA-protein/phosphosite regulatory map

KW - CNAI

KW - mass spectrometry

KW - ovarian cancer

KW - penalized mixed effect model

KW - phosphoproteome

KW - Proteogenomics

KW - statistics

KW - trans protein/phosphosite hubs

UR - http://www.scopus.com/inward/record.url?scp=85071345622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071345622&partnerID=8YFLogxK

U2 - 10.1074/mcp.RA118.001229

DO - 10.1074/mcp.RA118.001229

M3 - Article

C2 - 31281117

AN - SCOPUS:85071345622

VL - 18

SP - S66-S81

JO - Molecular and Cellular Proteomics

JF - Molecular and Cellular Proteomics

SN - 1535-9476

IS - 8

ER -