Minimizing human intervention for constructing Korean part-of-speech tagged corpus

Do Gil Lee, Gumwon Hong, Seok Kee Lee, Hae-Chang Rim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.

Original languageEnglish
Pages (from-to)2336-2338
Number of pages3
JournalIEICE Transactions on Information and Systems
VolumeE93-D
Issue number8
DOIs
Publication statusPublished - 2010 Aug 1

Keywords

  • Morphological analysis
  • Part-of-speech tagging
  • Part-ofspeech tagged corpus

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Minimizing human intervention for constructing Korean part-of-speech tagged corpus. / Lee, Do Gil; Hong, Gumwon; Lee, Seok Kee; Rim, Hae-Chang.

In: IEICE Transactions on Information and Systems, Vol. E93-D, No. 8, 01.08.2010, p. 2336-2338.

Research output: Contribution to journalArticle

@article{16fb48adfefe4e8db7c6a784498f48ea,
title = "Minimizing human intervention for constructing Korean part-of-speech tagged corpus",
abstract = "The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.",
keywords = "Morphological analysis, Part-of-speech tagging, Part-ofspeech tagged corpus",
author = "Lee, {Do Gil} and Gumwon Hong and Lee, {Seok Kee} and Hae-Chang Rim",
year = "2010",
month = "8",
day = "1",
doi = "10.1587/transinf.E93.D.2336",
language = "English",
volume = "E93-D",
pages = "2336--2338",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "8",

}

TY - JOUR

T1 - Minimizing human intervention for constructing Korean part-of-speech tagged corpus

AU - Lee, Do Gil

AU - Hong, Gumwon

AU - Lee, Seok Kee

AU - Rim, Hae-Chang

PY - 2010/8/1

Y1 - 2010/8/1

N2 - The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.

AB - The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.

KW - Morphological analysis

KW - Part-of-speech tagging

KW - Part-ofspeech tagged corpus

UR - http://www.scopus.com/inward/record.url?scp=77956051119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956051119&partnerID=8YFLogxK

U2 - 10.1587/transinf.E93.D.2336

DO - 10.1587/transinf.E93.D.2336

M3 - Article

AN - SCOPUS:77956051119

VL - E93-D

SP - 2336

EP - 2338

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 8

ER -