Alleviating syntactic term mismatches in Korean text retrieval

Bo Hyun Yun, Yong Jae Kwak, Hae-Chang Rim

Research output: Contribution to journalArticle

Abstract

In Korean information retrieval, syntactic term mismatches between index terms and query terms have been a serious obstacle to the enhancement of retrieval performance. Conventional approaches try to alleviate syntactic term mismatches either by segmenting compound nouns or by normalizing different representation of noun phrases. However, using only the segmentation may cause similarity measurements to increase unnecessarily since the segmented unit nouns can't discriminate different formations of compound nouns. On the other hand, using only the normalization has a limit in alleviating syntactic term mismatches because of the specificity of normalized phrases. In this paper, we propose a Korean information retrieval system which can alleviate syntactic term mismatches by segmenting compound nouns as well as by normalizing noun phrases, and which can provide appropriate similarity measurements. In the indexing module, we segment compound nouns by statistical information and normalize noun phrases by dependency relations. Then, we extract terms attached with boundary information. Finally, terms are weighted by a newly devised weighting scheme appropriate for Korean noun phrases. In the retrieval module, we compute the similarity considering partial matching by using boundary information. The experimental results show that the proposed method can alleviate syntactic term mismatches and improve the precision without decreasing the recall.

Original languageEnglish
Pages (from-to)481-500
Number of pages20
JournalInformation Processing and Management
Volume35
Issue number4
DOIs
Publication statusPublished - 1999 Dec 1

    Fingerprint

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Library and Information Sciences

Cite this