Unsupervised lexical entry acquisition model based on representation of human mental lexicon

Wonhee Yu, Doo Soon Park, Taeweon Suh, Heuiseok Lim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.

Original languageEnglish
Pages (from-to)2229-2241
Number of pages13
JournalInformation
Volume14
Issue number7
Publication statusPublished - 2011 Jul

Keywords

  • Language learning
  • Lexical acquisition
  • Machine readable dictionary
  • Mental lexicon

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'Unsupervised lexical entry acquisition model based on representation of human mental lexicon'. Together they form a unique fingerprint.

  • Cite this