ME-based biomedical named entity recognition using lexical knowledge

Kyung Mi Park, Seon Ho Kim, Hae-Chang Rim, Young Sook Hwang

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

In this paper, we present a two-phase biomedical NE-recognition method based on a ME model: we first recognize biomedical terms and then assign appropriate semantic classes to the recognized terms. In the two-phase NE-recognition method, the performance of the term-recognition phase is very important, because the semantic classification is performed on the region identified at the recognition phase. In this study, in order to improve the performance of term recognition, we try to incorporate lexical knowledge into pre- and postprocessing of the term-recognition phase. In the preprocessing step, we use domain-salient words as lexical knowledge obtained by corpus comparison. In the postprocessing step, we utilize χ 2-based collocations gained from Medline corpus. In addition, we use morphological patterns extracted from the training data as features for learning the ME-based classifiers. Experimental results show that the performance of NE-recognition can be improved by utilizing such lexical knowledge.

Original languageEnglish
Pages (from-to)4-21
Number of pages18
JournalACM Transactions on Asian Language Information Processing
Volume5
Issue number1
DOIs
Publication statusPublished - 2006 Sep 13

Fingerprint

Semantics
Classifiers

Keywords

  • Biomedical term recognition
  • Collocations
  • Maximum-entropy model
  • Morphological patterns
  • Postprocessing
  • Preprocessing
  • Salient words
  • Semantic classification

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

ME-based biomedical named entity recognition using lexical knowledge. / Park, Kyung Mi; Kim, Seon Ho; Rim, Hae-Chang; Hwang, Young Sook.

In: ACM Transactions on Asian Language Information Processing, Vol. 5, No. 1, 13.09.2006, p. 4-21.

Research output: Contribution to journalArticle

@article{cc337ef71bc34c4eac1c1318af70399d,
title = "ME-based biomedical named entity recognition using lexical knowledge",
abstract = "In this paper, we present a two-phase biomedical NE-recognition method based on a ME model: we first recognize biomedical terms and then assign appropriate semantic classes to the recognized terms. In the two-phase NE-recognition method, the performance of the term-recognition phase is very important, because the semantic classification is performed on the region identified at the recognition phase. In this study, in order to improve the performance of term recognition, we try to incorporate lexical knowledge into pre- and postprocessing of the term-recognition phase. In the preprocessing step, we use domain-salient words as lexical knowledge obtained by corpus comparison. In the postprocessing step, we utilize χ 2-based collocations gained from Medline corpus. In addition, we use morphological patterns extracted from the training data as features for learning the ME-based classifiers. Experimental results show that the performance of NE-recognition can be improved by utilizing such lexical knowledge.",
keywords = "Biomedical term recognition, Collocations, Maximum-entropy model, Morphological patterns, Postprocessing, Preprocessing, Salient words, Semantic classification",
author = "Park, {Kyung Mi} and Kim, {Seon Ho} and Hae-Chang Rim and Hwang, {Young Sook}",
year = "2006",
month = "9",
day = "13",
doi = "10.1145/1131348.1131350",
language = "English",
volume = "5",
pages = "4--21",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - ME-based biomedical named entity recognition using lexical knowledge

AU - Park, Kyung Mi

AU - Kim, Seon Ho

AU - Rim, Hae-Chang

AU - Hwang, Young Sook

PY - 2006/9/13

Y1 - 2006/9/13

N2 - In this paper, we present a two-phase biomedical NE-recognition method based on a ME model: we first recognize biomedical terms and then assign appropriate semantic classes to the recognized terms. In the two-phase NE-recognition method, the performance of the term-recognition phase is very important, because the semantic classification is performed on the region identified at the recognition phase. In this study, in order to improve the performance of term recognition, we try to incorporate lexical knowledge into pre- and postprocessing of the term-recognition phase. In the preprocessing step, we use domain-salient words as lexical knowledge obtained by corpus comparison. In the postprocessing step, we utilize χ 2-based collocations gained from Medline corpus. In addition, we use morphological patterns extracted from the training data as features for learning the ME-based classifiers. Experimental results show that the performance of NE-recognition can be improved by utilizing such lexical knowledge.

AB - In this paper, we present a two-phase biomedical NE-recognition method based on a ME model: we first recognize biomedical terms and then assign appropriate semantic classes to the recognized terms. In the two-phase NE-recognition method, the performance of the term-recognition phase is very important, because the semantic classification is performed on the region identified at the recognition phase. In this study, in order to improve the performance of term recognition, we try to incorporate lexical knowledge into pre- and postprocessing of the term-recognition phase. In the preprocessing step, we use domain-salient words as lexical knowledge obtained by corpus comparison. In the postprocessing step, we utilize χ 2-based collocations gained from Medline corpus. In addition, we use morphological patterns extracted from the training data as features for learning the ME-based classifiers. Experimental results show that the performance of NE-recognition can be improved by utilizing such lexical knowledge.

KW - Biomedical term recognition

KW - Collocations

KW - Maximum-entropy model

KW - Morphological patterns

KW - Postprocessing

KW - Preprocessing

KW - Salient words

KW - Semantic classification

UR - http://www.scopus.com/inward/record.url?scp=33748426379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748426379&partnerID=8YFLogxK

U2 - 10.1145/1131348.1131350

DO - 10.1145/1131348.1131350

M3 - Article

AN - SCOPUS:33748426379

VL - 5

SP - 4

EP - 21

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 1

ER -