Unsupervised lexical entry acquisition model based on representation of human mental lexicon

Wonhee Yu, Doo Soon Park, Taeweon Suh, Heui Seok Lim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.

Original languageEnglish
Pages (from-to)2229-2241
Number of pages13
JournalInformation
Volume14
Issue number7
Publication statusPublished - 2011 Jul 1

Fingerprint

Unsupervised learning
Entropy

Keywords

  • Language learning
  • Lexical acquisition
  • Machine readable dictionary
  • Mental lexicon

ASJC Scopus subject areas

  • General

Cite this

Unsupervised lexical entry acquisition model based on representation of human mental lexicon. / Yu, Wonhee; Park, Doo Soon; Suh, Taeweon; Lim, Heui Seok.

In: Information, Vol. 14, No. 7, 01.07.2011, p. 2229-2241.

Research output: Contribution to journalArticle

@article{db07161aa062469bb046596148eeffca,
title = "Unsupervised lexical entry acquisition model based on representation of human mental lexicon",
abstract = "This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100{\%} and 99.04{\%}, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.",
keywords = "Language learning, Lexical acquisition, Machine readable dictionary, Mental lexicon",
author = "Wonhee Yu and Park, {Doo Soon} and Taeweon Suh and Lim, {Heui Seok}",
year = "2011",
month = "7",
day = "1",
language = "English",
volume = "14",
pages = "2229--2241",
journal = "Information (Japan)",
issn = "1343-4500",
publisher = "International Information Institute",
number = "7",

}

TY - JOUR

T1 - Unsupervised lexical entry acquisition model based on representation of human mental lexicon

AU - Yu, Wonhee

AU - Park, Doo Soon

AU - Suh, Taeweon

AU - Lim, Heui Seok

PY - 2011/7/1

Y1 - 2011/7/1

N2 - This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.

AB - This paper proposes a computational lexical entry acquisition model based on a representation model of the mental lexicon. The proposed model acquires lexical entries from a raw corpus by unsupervised learning, like human beings. The model is composed of fullform and morpheme acquisition modules. In the full-form acquisition module, core fullforms are automatically acquired according to the frequency and recency thresholds. In the morpheme acquisition module, a repeatedly occurring substring in different full-forms is chosen as a candidate morpheme. Then, the candidate is corroborated as a morpheme by using the entropy measure of syllables in the string. We tested the model with a Korean language raw corpus as large as about 16 million Korean full-forms. The test results show that the model successively acquires major Korean language full-forms and morphemes, with an average precision of 100% and 99.04%, respectively. In addition, we observed a vocabulary spurt during learning, which is a phenomenon peculiar to children's language learning process.

KW - Language learning

KW - Lexical acquisition

KW - Machine readable dictionary

KW - Mental lexicon

UR - http://www.scopus.com/inward/record.url?scp=84860124432&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860124432&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84860124432

VL - 14

SP - 2229

EP - 2241

JO - Information (Japan)

JF - Information (Japan)

SN - 1343-4500

IS - 7

ER -