Estimation of directed subnetworks in ultra high dimensional data for gene network problems

Sung Won Han, Sung Hwan Kim, Junhee Seok, Jeewhan Yoon, Hua Zhong

Research output: Contribution to journalArticle

Abstract

The next generation sequencing technology generates ultra high dimensional data. However, it is computationally impractical to estimate an entire Directed Acyclic Graph (DAG) under such high dimensionality. In this paper, we discuss two different types of problems to estimate subnetworks in ultra high dimensional data. The first problem is to estimate DAGs of a subnetwork adjacent to a target gene, and the second problem is to estimate DAGs of multiple subnetworks without information about a target gene. To address each problem, we propose efficient methods to estimate subnetworks by using layer-dependent weights with BIC criteria or by using community detection approaches to identify clusters as subnetworks. We apply such approaches to the gene expression data of breast cancer in TCGA as a practical example.

Original languageEnglish
Pages (from-to)657-676
Number of pages20
JournalStatistics and its Interface
Volume10
Issue number4
DOIs
Publication statusPublished - 2017

Fingerprint

Gene Networks
High-dimensional Data
Genes
Gene expression
Estimate
Gene
Community Detection
Target
Directed Acyclic Graph
Gene Expression Data
Breast Cancer
Sequencing
Dimensionality
Adjacent
Entire
Dependent

Keywords

  • Bayesian network
  • Directed acyclic graph
  • High dimension
  • Penalized likelihood
  • Subnetworks

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Cite this

Estimation of directed subnetworks in ultra high dimensional data for gene network problems. / Han, Sung Won; Kim, Sung Hwan; Seok, Junhee; Yoon, Jeewhan; Zhong, Hua.

In: Statistics and its Interface, Vol. 10, No. 4, 2017, p. 657-676.

Research output: Contribution to journalArticle

@article{c04080ba5abf43d487513df8da76bf8f,
title = "Estimation of directed subnetworks in ultra high dimensional data for gene network problems",
abstract = "The next generation sequencing technology generates ultra high dimensional data. However, it is computationally impractical to estimate an entire Directed Acyclic Graph (DAG) under such high dimensionality. In this paper, we discuss two different types of problems to estimate subnetworks in ultra high dimensional data. The first problem is to estimate DAGs of a subnetwork adjacent to a target gene, and the second problem is to estimate DAGs of multiple subnetworks without information about a target gene. To address each problem, we propose efficient methods to estimate subnetworks by using layer-dependent weights with BIC criteria or by using community detection approaches to identify clusters as subnetworks. We apply such approaches to the gene expression data of breast cancer in TCGA as a practical example.",
keywords = "Bayesian network, Directed acyclic graph, High dimension, Penalized likelihood, Subnetworks",
author = "Han, {Sung Won} and Kim, {Sung Hwan} and Junhee Seok and Jeewhan Yoon and Hua Zhong",
year = "2017",
doi = "10.4310/SII.2017.v10.n4.a10",
language = "English",
volume = "10",
pages = "657--676",
journal = "Statistics and its Interface",
issn = "1938-7989",
publisher = "International Press of Boston, Inc.",
number = "4",

}

TY - JOUR

T1 - Estimation of directed subnetworks in ultra high dimensional data for gene network problems

AU - Han, Sung Won

AU - Kim, Sung Hwan

AU - Seok, Junhee

AU - Yoon, Jeewhan

AU - Zhong, Hua

PY - 2017

Y1 - 2017

N2 - The next generation sequencing technology generates ultra high dimensional data. However, it is computationally impractical to estimate an entire Directed Acyclic Graph (DAG) under such high dimensionality. In this paper, we discuss two different types of problems to estimate subnetworks in ultra high dimensional data. The first problem is to estimate DAGs of a subnetwork adjacent to a target gene, and the second problem is to estimate DAGs of multiple subnetworks without information about a target gene. To address each problem, we propose efficient methods to estimate subnetworks by using layer-dependent weights with BIC criteria or by using community detection approaches to identify clusters as subnetworks. We apply such approaches to the gene expression data of breast cancer in TCGA as a practical example.

AB - The next generation sequencing technology generates ultra high dimensional data. However, it is computationally impractical to estimate an entire Directed Acyclic Graph (DAG) under such high dimensionality. In this paper, we discuss two different types of problems to estimate subnetworks in ultra high dimensional data. The first problem is to estimate DAGs of a subnetwork adjacent to a target gene, and the second problem is to estimate DAGs of multiple subnetworks without information about a target gene. To address each problem, we propose efficient methods to estimate subnetworks by using layer-dependent weights with BIC criteria or by using community detection approaches to identify clusters as subnetworks. We apply such approaches to the gene expression data of breast cancer in TCGA as a practical example.

KW - Bayesian network

KW - Directed acyclic graph

KW - High dimension

KW - Penalized likelihood

KW - Subnetworks

UR - http://www.scopus.com/inward/record.url?scp=85020119744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020119744&partnerID=8YFLogxK

U2 - 10.4310/SII.2017.v10.n4.a10

DO - 10.4310/SII.2017.v10.n4.a10

M3 - Article

VL - 10

SP - 657

EP - 676

JO - Statistics and its Interface

JF - Statistics and its Interface

SN - 1938-7989

IS - 4

ER -