Probabilistic Modeling of Korean Morphology

Do Gil Lee, Hae-Chang Rim

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

This paper proposes new probabilistic models for analyzing Korean morphology. In order to take advantage of the characteristics of Korean morphology, the proposed models are based on three linguistic units: eojeol (a Korean spacing unit), morpheme, and syllable. Unlike previous approaches that are based on rules and dictionaries, the probabilistic approach proposed in this study can automatically acquire complete linguistic knowledge from part-of-speech (POS) tagged corpora. In addition, this approach, without any system modification, is easily applicable to other corpora with different tagsets and annotation guidelines. The three different models and their combinations are evaluated on three corpora over a wide range of conditions. The eo-jeol-unit and syllable-unit models compensate for the weaknesses of the morpheme-unit model. The eojeol-unit model performed efficiently, and improved the precision. The syllable-unit model improved in precision as well, showing a particularly robust performance in treating unknown words. The proposed approach is also proven to outperform the previous approaches.

Original languageEnglish
Pages (from-to)945-955
Number of pages11
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume17
Issue number5
DOIs
Publication statusPublished - 2009

Keywords

  • Korean morphology
  • machine learning
  • morphologial analysis
  • probabilistic model

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Probabilistic Modeling of Korean Morphology'. Together they form a unique fingerprint.

  • Cite this