Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models

Jung Tae Lee, Sang Bum Kim, Young In Song, Hae-Chang Rim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Citations (Scopus)

Abstract

Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.

Original languageEnglish
Title of host publicationEMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL
Pages410-418
Number of pages9
Publication statusPublished - 2008 Dec 1
Event2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States
Duration: 2008 Oct 252008 Oct 27

Other

Other2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation
CountryUnited States
CityHonolulu, HI
Period08/10/2508/10/27

Fingerprint

Acoustic variables control
Degradation
Experiments

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Lee, J. T., Kim, S. B., Song, Y. I., & Rim, H-C. (2008). Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 410-418)

Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. / Lee, Jung Tae; Kim, Sang Bum; Song, Young In; Rim, Hae-Chang.

EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. p. 410-418.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, JT, Kim, SB, Song, YI & Rim, H-C 2008, Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. in EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. pp. 410-418, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation, Honolulu, HI, United States, 08/10/25.
Lee JT, Kim SB, Song YI, Rim H-C. Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. p. 410-418
Lee, Jung Tae ; Kim, Sang Bum ; Song, Young In ; Rim, Hae-Chang. / Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models. EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. pp. 410-418
@inproceedings{157e44fa2f9d40799432a988ec384cae,
title = "Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models",
abstract = "Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.",
author = "Lee, {Jung Tae} and Kim, {Sang Bum} and Song, {Young In} and Hae-Chang Rim",
year = "2008",
month = "12",
day = "1",
language = "English",
pages = "410--418",
booktitle = "EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL",

}

TY - GEN

T1 - Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models

AU - Lee, Jung Tae

AU - Kim, Sang Bum

AU - Song, Young In

AU - Rim, Hae-Chang

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.

AB - Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.

UR - http://www.scopus.com/inward/record.url?scp=80053349509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053349509&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053349509

SP - 410

EP - 418

BT - EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL

ER -