Building a large-scale commonsense knowledge base by converting an existing one in a different language

Yuchul Jung, Joo Young Lee, Youngho Kim, Jaehyun Park, Sung Hyon Myaeng, Hae Chang Rim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
Pages23-34
Number of pages12
Publication statusPublished - 2007
Event8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007 - Mexico City, Mexico
Duration: 2007 Feb 182007 Feb 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4394 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
CountryMexico
CityMexico City
Period07/2/1807/2/24

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Building a large-scale commonsense knowledge base by converting an existing one in a different language'. Together they form a unique fingerprint.

  • Cite this

    Jung, Y., Lee, J. Y., Kim, Y., Park, J., Myaeng, S. H., & Rim, H. C. (2007). Building a large-scale commonsense knowledge base by converting an existing one in a different language. In Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings (pp. 23-34). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4394 LNCS).