Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA

Yoonsuh Jung, Jianhua Z. Huang, Jianhua Hu

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.

Original languageEnglish
Pages (from-to)1355-1367
Number of pages13
JournalJournal of the American Statistical Association
Volume109
Issue number508
DOIs
Publication statusPublished - 2014 Oct 2
Externally publishedYes

Fingerprint

Single nucleotide Polymorphism
Biomarkers
Analysis of variance
Logistics
Modeling
Interaction Effects
Genotype
Phenotype
Penalty
Reduced Rank
Multiple Sclerosis
Penalized Likelihood
Logit
Environmental Factors
Multiple Regression
Dimensionality Reduction
Logistic Regression
Polymorphism
Summation
Sample Size

Keywords

  • BIC
  • GWAS
  • L-penalty
  • MM algorithm
  • Penalized Bernoulli likelihood
  • Simultaneous modeling of SNPs

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Biomarker Detection in Association Studies : Modeling SNPs Simultaneously via Logistic ANOVA. / Jung, Yoonsuh; Huang, Jianhua Z.; Hu, Jianhua.

In: Journal of the American Statistical Association, Vol. 109, No. 508, 02.10.2014, p. 1355-1367.

Research output: Contribution to journalArticle

@article{1eaabaaafb5244118745fb284c6f9c7f,
title = "Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA",
abstract = "In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.",
keywords = "BIC, GWAS, L-penalty, MM algorithm, Penalized Bernoulli likelihood, Simultaneous modeling of SNPs",
author = "Yoonsuh Jung and Huang, {Jianhua Z.} and Jianhua Hu",
year = "2014",
month = "10",
day = "2",
doi = "10.1080/01621459.2014.928217",
language = "English",
volume = "109",
pages = "1355--1367",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "508",

}

TY - JOUR

T1 - Biomarker Detection in Association Studies

T2 - Modeling SNPs Simultaneously via Logistic ANOVA

AU - Jung, Yoonsuh

AU - Huang, Jianhua Z.

AU - Hu, Jianhua

PY - 2014/10/2

Y1 - 2014/10/2

N2 - In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.

AB - In genome-wide association studies, the primary task is to detect biomarkers in the form of single nucleotide polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs compared to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently, the most commonly used approach is still to analyze one SNP at a time. In this article, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a majorization–minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a multiple sclerosis dataset and simulated datasets and shows promise in biomarker detection.

KW - BIC

KW - GWAS

KW - L-penalty

KW - MM algorithm

KW - Penalized Bernoulli likelihood

KW - Simultaneous modeling of SNPs

UR - http://www.scopus.com/inward/record.url?scp=84919796818&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919796818&partnerID=8YFLogxK

U2 - 10.1080/01621459.2014.928217

DO - 10.1080/01621459.2014.928217

M3 - Article

AN - SCOPUS:84919796818

VL - 109

SP - 1355

EP - 1367

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 508

ER -