A 2-phased approach for detecting multiple loci associations with traits

Sunwon Lee, Jaewoo Kang, Junho Oh

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The recent advance in SNP genotyping has made a signifi cant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase in the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the fi rst phase, we reduce the dimensions of the SNP genotype data by extracting signifi cant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits. This paper is an extended version of our workshop paper presented in the 2010 International Workshop on Data Mining for High Throughput Data from Genome-Wide Association Studies.

Original languageEnglish
Pages (from-to)535-556
Number of pages22
JournalInternational Journal of Data Mining and Bioinformatics
Volume6
Issue number5
DOIs
Publication statusPublished - 2012 Sep 1

Fingerprint

Single Nucleotide Polymorphism
Genotype
Association rules
Transactional Analysis
Data mining
Education
Data Mining
Genes
Genome-Wide Association Study
Throughput
model analysis
transaction
Costs
Costs and Cost Analysis
Experiments
experiment
costs

Keywords

  • Apriori algorithm
  • Bioinformatics
  • Class association rule mining
  • Data mining
  • GWAS
  • SNP
  • Term frequency - inverse document frequency
  • TF-IDF

ASJC Scopus subject areas

  • Library and Information Sciences
  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

A 2-phased approach for detecting multiple loci associations with traits. / Lee, Sunwon; Kang, Jaewoo; Oh, Junho.

In: International Journal of Data Mining and Bioinformatics, Vol. 6, No. 5, 01.09.2012, p. 535-556.

Research output: Contribution to journalArticle

@article{597cd85f8a17488a9663f0debc2e3c34,
title = "A 2-phased approach for detecting multiple loci associations with traits",
abstract = "The recent advance in SNP genotyping has made a signifi cant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase in the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the fi rst phase, we reduce the dimensions of the SNP genotype data by extracting signifi cant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits. This paper is an extended version of our workshop paper presented in the 2010 International Workshop on Data Mining for High Throughput Data from Genome-Wide Association Studies.",
keywords = "Apriori algorithm, Bioinformatics, Class association rule mining, Data mining, GWAS, SNP, Term frequency - inverse document frequency, TF-IDF",
author = "Sunwon Lee and Jaewoo Kang and Junho Oh",
year = "2012",
month = "9",
day = "1",
doi = "10.1504/IJDMB.2012.049318",
language = "English",
volume = "6",
pages = "535--556",
journal = "International Journal of Data Mining and Bioinformatics",
issn = "1748-5673",
publisher = "Inderscience Enterprises Ltd",
number = "5",

}

TY - JOUR

T1 - A 2-phased approach for detecting multiple loci associations with traits

AU - Lee, Sunwon

AU - Kang, Jaewoo

AU - Oh, Junho

PY - 2012/9/1

Y1 - 2012/9/1

N2 - The recent advance in SNP genotyping has made a signifi cant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase in the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the fi rst phase, we reduce the dimensions of the SNP genotype data by extracting signifi cant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits. This paper is an extended version of our workshop paper presented in the 2010 International Workshop on Data Mining for High Throughput Data from Genome-Wide Association Studies.

AB - The recent advance in SNP genotyping has made a signifi cant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase in the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the fi rst phase, we reduce the dimensions of the SNP genotype data by extracting signifi cant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits. This paper is an extended version of our workshop paper presented in the 2010 International Workshop on Data Mining for High Throughput Data from Genome-Wide Association Studies.

KW - Apriori algorithm

KW - Bioinformatics

KW - Class association rule mining

KW - Data mining

KW - GWAS

KW - SNP

KW - Term frequency - inverse document frequency

KW - TF-IDF

UR - http://www.scopus.com/inward/record.url?scp=84867051665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867051665&partnerID=8YFLogxK

U2 - 10.1504/IJDMB.2012.049318

DO - 10.1504/IJDMB.2012.049318

M3 - Article

C2 - 23155781

AN - SCOPUS:84867051665

VL - 6

SP - 535

EP - 556

JO - International Journal of Data Mining and Bioinformatics

JF - International Journal of Data Mining and Bioinformatics

SN - 1748-5673

IS - 5

ER -