Acquiring Korean lexical entry from a raw corpus

Wonhee Yu, Kinam Park, Soon Young Jung, Heui Seok Lim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.

Original languageEnglish
Title of host publication2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010
DOIs
Publication statusPublished - 2010 Nov 11
Event2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010 - Cebu, Philippines
Duration: 2010 Aug 112010 Aug 13

Other

Other2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010
CountryPhilippines
CityCebu
Period10/8/1110/8/13

Fingerprint

Unsupervised learning
Entropy

Keywords

  • Language learning
  • Lexical acquisition
  • Machine readable dictionary
  • Mental lexicon

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Yu, W., Park, K., Jung, S. Y., & Lim, H. S. (2010). Acquiring Korean lexical entry from a raw corpus. In 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010 [5581289] https://doi.org/10.1109/ITCS.2010.5581289

Acquiring Korean lexical entry from a raw corpus. / Yu, Wonhee; Park, Kinam; Jung, Soon Young; Lim, Heui Seok.

2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010. 2010. 5581289.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, W, Park, K, Jung, SY & Lim, HS 2010, Acquiring Korean lexical entry from a raw corpus. in 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010., 5581289, 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010, Cebu, Philippines, 10/8/11. https://doi.org/10.1109/ITCS.2010.5581289
Yu W, Park K, Jung SY, Lim HS. Acquiring Korean lexical entry from a raw corpus. In 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010. 2010. 5581289 https://doi.org/10.1109/ITCS.2010.5581289
Yu, Wonhee ; Park, Kinam ; Jung, Soon Young ; Lim, Heui Seok. / Acquiring Korean lexical entry from a raw corpus. 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010. 2010.
@inproceedings{83cfc914c7d2403e81fee795e054cb89,
title = "Acquiring Korean lexical entry from a raw corpus",
abstract = "This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100{\%} and 99.04{\%}, respectively.",
keywords = "Language learning, Lexical acquisition, Machine readable dictionary, Mental lexicon",
author = "Wonhee Yu and Kinam Park and Jung, {Soon Young} and Lim, {Heui Seok}",
year = "2010",
month = "11",
day = "11",
doi = "10.1109/ITCS.2010.5581289",
language = "English",
isbn = "9781424475858",
booktitle = "2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010",

}

TY - GEN

T1 - Acquiring Korean lexical entry from a raw corpus

AU - Yu, Wonhee

AU - Park, Kinam

AU - Jung, Soon Young

AU - Lim, Heui Seok

PY - 2010/11/11

Y1 - 2010/11/11

N2 - This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.

AB - This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning like human. The model is composed of full-form and morpheme acquisition modules. In the full-from acquisition module, core full-forms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. The experimental results with a Korean corpus of which size is about 16 million full-forms show that the model successively acquires major full-forms and morphemes with the precision of 100% and 99.04%, respectively.

KW - Language learning

KW - Lexical acquisition

KW - Machine readable dictionary

KW - Mental lexicon

UR - http://www.scopus.com/inward/record.url?scp=78049497049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049497049&partnerID=8YFLogxK

U2 - 10.1109/ITCS.2010.5581289

DO - 10.1109/ITCS.2010.5581289

M3 - Conference contribution

SN - 9781424475858

BT - 2010 2nd International Conference on Information Technology Convergence and Services, ITCS 2010

ER -