Regression-Based Network Estimation for High-Dimensional Genetic Data

Kyu Min Lee, Minhyeok Lee, Junhee Seok, Sung Won Han

Research output: Contribution to journalArticle

Abstract

Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.

Original languageEnglish
Pages (from-to)336-349
Number of pages14
JournalJournal of Computational Biology
Volume26
Issue number4
DOIs
Publication statusPublished - 2019 Apr 1

Fingerprint

Gene Regulatory Networks
High-dimensional
Genes
Regression
Gene Networks
Elastic Net
Genome
Technology
Gene Expression
High-dimensional Data
Gene Expression Data
Sequencing
Regulator Genes
Gene expression
Vertex of a graph
Adaptive Lasso
Multicollinearity
Regulator
Estimate
Penalty

Keywords

  • adaptive elastic-net
  • gene network estimation
  • graphical model
  • regression-based approach.

ASJC Scopus subject areas

  • Modelling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

Regression-Based Network Estimation for High-Dimensional Genetic Data. / Lee, Kyu Min; Lee, Minhyeok; Seok, Junhee; Han, Sung Won.

In: Journal of Computational Biology, Vol. 26, No. 4, 01.04.2019, p. 336-349.

Research output: Contribution to journalArticle

@article{22d5f12344eb4af38157abd265c59a9d,
title = "Regression-Based Network Estimation for High-Dimensional Genetic Data",
abstract = "Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.",
keywords = "adaptive elastic-net, gene network estimation, graphical model, regression-based approach.",
author = "Lee, {Kyu Min} and Minhyeok Lee and Junhee Seok and Han, {Sung Won}",
year = "2019",
month = "4",
day = "1",
doi = "10.1089/cmb.2018.0225",
language = "English",
volume = "26",
pages = "336--349",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "4",

}

TY - JOUR

T1 - Regression-Based Network Estimation for High-Dimensional Genetic Data

AU - Lee, Kyu Min

AU - Lee, Minhyeok

AU - Seok, Junhee

AU - Han, Sung Won

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.

AB - Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.

KW - adaptive elastic-net

KW - gene network estimation

KW - graphical model

KW - regression-based approach.

UR - http://www.scopus.com/inward/record.url?scp=85064082830&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064082830&partnerID=8YFLogxK

U2 - 10.1089/cmb.2018.0225

DO - 10.1089/cmb.2018.0225

M3 - Article

VL - 26

SP - 336

EP - 349

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 4

ER -