Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures

Jeong Min Chae, Heung Bum Oh, Sung Eun Choi, Choong Hwan Cha, Myung Hee Kim, Soon Young Jung

Research output: Contribution to journalArticle

Abstract

Background : Since the human genome project was completed in 2003, there have been numerous reports on cancer and related markers. This study was aimed to develop a system to extract automatically information regarding the relationship between cancer and tumor markers from biomedical literatures. Methods : Named entities of tumor markers were recognized by both a dictionary-based method and machine learning technology of the support vector machine. Named entities of cancers were recognized by the MeSH dictionary. Results : Relational and filtering keywords were selected after annotating 160 abstracts from PubMed. Relational information was extracted only when one of the relational keywords was in an appropriate position along the parse tree of a sentence with both tumor marker and disease entities. The performance of the system developed in this study was evaluated with another set of 77 abstracts. With the relational and filtering keyword used in the system, precision was 94.38% and recall was 66.14%, while without the expert knowledge precision was 49.16% and recall was 69.29%. Conclusions : We developed a system that can extract relational information between a tumor and its markers by incorporating expert knowledge into the system. The system exploiting expert knowledge would serve as a reference when developing another information extraction system in various medical fields.

Original languageEnglish
Pages (from-to)79-87
Number of pages9
JournalKorean Journal of Laboratory Medicine
Volume28
Issue number1
DOIs
Publication statusPublished - 2008 Dec 1

Fingerprint

Tumor Biomarkers
Information Systems
Glossaries
Human Genome Project
Expert Systems
Neoplasms
Information Storage and Retrieval
PubMed
Expert systems
Support vector machines
Learning systems
Genes
Technology

Keywords

  • Information extraction
  • Tumor
  • Tumor marker

ASJC Scopus subject areas

  • Biochemistry, medical
  • Clinical Biochemistry

Cite this

Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures. / Chae, Jeong Min; Oh, Heung Bum; Choi, Sung Eun; Cha, Choong Hwan; Kim, Myung Hee; Jung, Soon Young.

In: Korean Journal of Laboratory Medicine, Vol. 28, No. 1, 01.12.2008, p. 79-87.

Research output: Contribution to journalArticle

Chae, Jeong Min ; Oh, Heung Bum ; Choi, Sung Eun ; Cha, Choong Hwan ; Kim, Myung Hee ; Jung, Soon Young. / Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures. In: Korean Journal of Laboratory Medicine. 2008 ; Vol. 28, No. 1. pp. 79-87.
@article{2540f5a6bc8b4536bf77ff9b56c64fb9,
title = "Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures",
abstract = "Background : Since the human genome project was completed in 2003, there have been numerous reports on cancer and related markers. This study was aimed to develop a system to extract automatically information regarding the relationship between cancer and tumor markers from biomedical literatures. Methods : Named entities of tumor markers were recognized by both a dictionary-based method and machine learning technology of the support vector machine. Named entities of cancers were recognized by the MeSH dictionary. Results : Relational and filtering keywords were selected after annotating 160 abstracts from PubMed. Relational information was extracted only when one of the relational keywords was in an appropriate position along the parse tree of a sentence with both tumor marker and disease entities. The performance of the system developed in this study was evaluated with another set of 77 abstracts. With the relational and filtering keyword used in the system, precision was 94.38{\%} and recall was 66.14{\%}, while without the expert knowledge precision was 49.16{\%} and recall was 69.29{\%}. Conclusions : We developed a system that can extract relational information between a tumor and its markers by incorporating expert knowledge into the system. The system exploiting expert knowledge would serve as a reference when developing another information extraction system in various medical fields.",
keywords = "Information extraction, Tumor, Tumor marker",
author = "Chae, {Jeong Min} and Oh, {Heung Bum} and Choi, {Sung Eun} and Cha, {Choong Hwan} and Kim, {Myung Hee} and Jung, {Soon Young}",
year = "2008",
month = "12",
day = "1",
doi = "10.3343/kjlm.2008.28.1.79",
language = "English",
volume = "28",
pages = "79--87",
journal = "Annals of Laboratory Medicine",
issn = "2234-3806",
publisher = "Seoul National University",
number = "1",

}

TY - JOUR

T1 - Development of a system for extracting the information of candidate tumor markers reported in biomedical literatures

AU - Chae, Jeong Min

AU - Oh, Heung Bum

AU - Choi, Sung Eun

AU - Cha, Choong Hwan

AU - Kim, Myung Hee

AU - Jung, Soon Young

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Background : Since the human genome project was completed in 2003, there have been numerous reports on cancer and related markers. This study was aimed to develop a system to extract automatically information regarding the relationship between cancer and tumor markers from biomedical literatures. Methods : Named entities of tumor markers were recognized by both a dictionary-based method and machine learning technology of the support vector machine. Named entities of cancers were recognized by the MeSH dictionary. Results : Relational and filtering keywords were selected after annotating 160 abstracts from PubMed. Relational information was extracted only when one of the relational keywords was in an appropriate position along the parse tree of a sentence with both tumor marker and disease entities. The performance of the system developed in this study was evaluated with another set of 77 abstracts. With the relational and filtering keyword used in the system, precision was 94.38% and recall was 66.14%, while without the expert knowledge precision was 49.16% and recall was 69.29%. Conclusions : We developed a system that can extract relational information between a tumor and its markers by incorporating expert knowledge into the system. The system exploiting expert knowledge would serve as a reference when developing another information extraction system in various medical fields.

AB - Background : Since the human genome project was completed in 2003, there have been numerous reports on cancer and related markers. This study was aimed to develop a system to extract automatically information regarding the relationship between cancer and tumor markers from biomedical literatures. Methods : Named entities of tumor markers were recognized by both a dictionary-based method and machine learning technology of the support vector machine. Named entities of cancers were recognized by the MeSH dictionary. Results : Relational and filtering keywords were selected after annotating 160 abstracts from PubMed. Relational information was extracted only when one of the relational keywords was in an appropriate position along the parse tree of a sentence with both tumor marker and disease entities. The performance of the system developed in this study was evaluated with another set of 77 abstracts. With the relational and filtering keyword used in the system, precision was 94.38% and recall was 66.14%, while without the expert knowledge precision was 49.16% and recall was 69.29%. Conclusions : We developed a system that can extract relational information between a tumor and its markers by incorporating expert knowledge into the system. The system exploiting expert knowledge would serve as a reference when developing another information extraction system in various medical fields.

KW - Information extraction

KW - Tumor

KW - Tumor marker

UR - http://www.scopus.com/inward/record.url?scp=58149109338&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58149109338&partnerID=8YFLogxK

U2 - 10.3343/kjlm.2008.28.1.79

DO - 10.3343/kjlm.2008.28.1.79

M3 - Article

C2 - 18309259

AN - SCOPUS:58149109338

VL - 28

SP - 79

EP - 87

JO - Annals of Laboratory Medicine

JF - Annals of Laboratory Medicine

SN - 2234-3806

IS - 1

ER -