A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining

Sunwon Lee, Jaewoo Kang, Junho Oh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
Pages318-323
Number of pages6
DOIs
Publication statusPublished - 2010 Dec 1
Event2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010 - HongKong, China
Duration: 2010 Dec 182010 Dec 21

Other

Other2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
CountryChina
CityHongKong
Period10/12/1810/12/21

Fingerprint

Association rules
Single Nucleotide Polymorphism
Genotype
Transactional Analysis
Costs
Costs and Cost Analysis
Experiments

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Lee, S., Kang, J., & Oh, J. (2010). A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining. In 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010 (pp. 318-323). [5703821] https://doi.org/10.1109/BIBMW.2010.5703821

A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining. / Lee, Sunwon; Kang, Jaewoo; Oh, Junho.

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. p. 318-323 5703821.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, S, Kang, J & Oh, J 2010, A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining. in 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010., 5703821, pp. 318-323, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010, HongKong, China, 10/12/18. https://doi.org/10.1109/BIBMW.2010.5703821
Lee S, Kang J, Oh J. A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining. In 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. p. 318-323. 5703821 https://doi.org/10.1109/BIBMW.2010.5703821
Lee, Sunwon ; Kang, Jaewoo ; Oh, Junho. / A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining. 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. pp. 318-323
@inproceedings{5ca288e73d8948d88ca1f7b39c1c9c2c,
title = "A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining",
abstract = "The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.",
author = "Sunwon Lee and Jaewoo Kang and Junho Oh",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/BIBMW.2010.5703821",
language = "English",
isbn = "9781424483044",
pages = "318--323",
booktitle = "2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010",

}

TY - GEN

T1 - A scalable method for detecting multiple loci associated with traits using TF-IDF weighting and association rule mining

AU - Lee, Sunwon

AU - Kang, Jaewoo

AU - Oh, Junho

PY - 2010/12/1

Y1 - 2010/12/1

N2 - The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.

AB - The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase of the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of the transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through the literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits.

UR - http://www.scopus.com/inward/record.url?scp=79952018830&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952018830&partnerID=8YFLogxK

U2 - 10.1109/BIBMW.2010.5703821

DO - 10.1109/BIBMW.2010.5703821

M3 - Conference contribution

AN - SCOPUS:79952018830

SN - 9781424483044

SP - 318

EP - 323

BT - 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010

ER -