Korean spacing by improving viterbi segmentation

Gumwon Hong, Hae Chang Rim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper presents a Korean spacing approach which employs an improved Viterbi segmentation model. Traditional Viterbi segmentation using the word imigram language model is simple and fast, but has two problems: data sparseness and impmper preference of fewer segments. To overcome these limitations, the segmentation model is extended by employing a split probability based on character bigram. Contextual information is selectively used for further resolution of spacing ambiguities without much increase of the complexity. Experimental results show that the extended model performs better than the traditional segmentation model. Futhennore, compared to the state of the art system, our approach achieves better efficiency in terms of processing time without losing significant accuracy.

Original languageEnglish
Title of host publicationProceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology
Pages75-80
Number of pages6
DOIs
Publication statusPublished - 2007
Event6th International Conference on Advanced Language Processing and Web Information Technology, ALPIT 2007 - Luoyang, Henan, China
Duration: 2007 Aug 222007 Aug 24

Publication series

NameProceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology

Other

Other6th International Conference on Advanced Language Processing and Web Information Technology, ALPIT 2007
CountryChina
CityLuoyang, Henan
Period07/8/2207/8/24

ASJC Scopus subject areas

  • Computer Science(all)
  • Information Systems

Cite this

Hong, G., & Rim, H. C. (2007). Korean spacing by improving viterbi segmentation. In Proceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology (pp. 75-80). [4460618] (Proceedings - ALPIT 2007 6th International Conference on Advanced Language Processing and Web Information Technology). https://doi.org/10.1109/ALPIT.2007.84