BRONCO

Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Kyubum Lee, Sunwon Lee, Sungjoon Park, Sunkyu Kim, Suhkyung Kim, Kwanghun Choi, Aik-Choon Tan, Jaewoo Kang

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.

Original languageEnglish
JournalDatabase
Volume2016
DOIs
Publication statusPublished - 2016

Fingerprint

Oncology
processing technology
Natural Language Processing
Genes
drugs
Processing
Pharmaceutical Preparations
genomics
neoplasms
genes
gold
medicine
researchers
Precision Medicine
Preclinical Drug Evaluations
cell lines
Gold
Data Mining
Medicine
screening

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Information Systems

Cite this

BRONCO : Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. / Lee, Kyubum; Lee, Sunwon; Park, Sungjoon; Kim, Sunkyu; Kim, Suhkyung; Choi, Kwanghun; Tan, Aik-Choon; Kang, Jaewoo.

In: Database, Vol. 2016, 2016.

Research output: Contribution to journalArticle

Lee, Kyubum ; Lee, Sunwon ; Park, Sungjoon ; Kim, Sunkyu ; Kim, Suhkyung ; Choi, Kwanghun ; Tan, Aik-Choon ; Kang, Jaewoo. / BRONCO : Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. In: Database. 2016 ; Vol. 2016.
@article{1e260bb5e87c4f0b855ff61776f2ae12,
title = "BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations",
abstract = "Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.",
author = "Kyubum Lee and Sunwon Lee and Sungjoon Park and Sunkyu Kim and Suhkyung Kim and Kwanghun Choi and Aik-Choon Tan and Jaewoo Kang",
year = "2016",
doi = "10.1093/database/baw043",
language = "English",
volume = "2016",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - BRONCO

T2 - Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

AU - Lee, Kyubum

AU - Lee, Sunwon

AU - Park, Sungjoon

AU - Kim, Sunkyu

AU - Kim, Suhkyung

AU - Choi, Kwanghun

AU - Tan, Aik-Choon

AU - Kang, Jaewoo

PY - 2016

Y1 - 2016

N2 - Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.

AB - Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.

UR - http://www.scopus.com/inward/record.url?scp=84964894530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964894530&partnerID=8YFLogxK

U2 - 10.1093/database/baw043

DO - 10.1093/database/baw043

M3 - Article

VL - 2016

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

ER -