Estimation of sparse directed acyclic graphs for multivariate counts data

Sung Won Han, Hua Zhong

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an L1-penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.

Original languageEnglish
Pages (from-to)791-803
Number of pages13
JournalBiometrics
Volume72
Issue number3
DOIs
Publication statusPublished - 2016 Sep 1
Externally publishedYes

Fingerprint

Count Data
Directed Acyclic Graph
Multivariate Data
Normal distribution
Normal Distribution
Sequencing
Ovarian Cancer
Penalized Likelihood
ovarian neoplasms
Multivariate Normal Distribution
Log Normal Distribution
Gene Regulatory Network
Poisson distribution
Latent Variables
Adjacency Matrix
Genes
methodology
Throughput
Network Structure
Estimate

Keywords

  • Bayesian network
  • Count data
  • Directed acyclic graph
  • Lasso estimation
  • Penalized likelihood estimation
  • Unknown variable ordering

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Estimation of sparse directed acyclic graphs for multivariate counts data. / Han, Sung Won; Zhong, Hua.

In: Biometrics, Vol. 72, No. 3, 01.09.2016, p. 791-803.

Research output: Contribution to journalArticle

@article{5c148fe523884b4d87cf8323e214c4a2,
title = "Estimation of sparse directed acyclic graphs for multivariate counts data",
abstract = "The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an L1-penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.",
keywords = "Bayesian network, Count data, Directed acyclic graph, Lasso estimation, Penalized likelihood estimation, Unknown variable ordering",
author = "Han, {Sung Won} and Hua Zhong",
year = "2016",
month = "9",
day = "1",
doi = "10.1111/biom.12467",
language = "English",
volume = "72",
pages = "791--803",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - Estimation of sparse directed acyclic graphs for multivariate counts data

AU - Han, Sung Won

AU - Zhong, Hua

PY - 2016/9/1

Y1 - 2016/9/1

N2 - The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an L1-penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.

AB - The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an L1-penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.

KW - Bayesian network

KW - Count data

KW - Directed acyclic graph

KW - Lasso estimation

KW - Penalized likelihood estimation

KW - Unknown variable ordering

UR - http://www.scopus.com/inward/record.url?scp=84985920020&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84985920020&partnerID=8YFLogxK

U2 - 10.1111/biom.12467

DO - 10.1111/biom.12467

M3 - Article

C2 - 26849781

AN - SCOPUS:84985920020

VL - 72

SP - 791

EP - 803

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 3

ER -