Simple weighting techniques for query expansion in biomedical document retrieval

Young In Song, Kyoung Soo Han, So Young Park, Sang Bum Kim, Hae-Chang Rim

Research output: Contribution to journalArticle

Abstract

In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.

Original languageEnglish
Pages (from-to)1873-1876
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE90-D
Issue number11
DOIs
Publication statusPublished - 2007 Nov 1

Fingerprint

Terminology
Statistics
Experiments

Keywords

  • Biomedical document retrieval
  • Biomedical terminology
  • Biomedical terminology weighting
  • Query expansion

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Simple weighting techniques for query expansion in biomedical document retrieval. / Song, Young In; Han, Kyoung Soo; Park, So Young; Kim, Sang Bum; Rim, Hae-Chang.

In: IEICE Transactions on Information and Systems, Vol. E90-D, No. 11, 01.11.2007, p. 1873-1876.

Research output: Contribution to journalArticle

Song, Young In ; Han, Kyoung Soo ; Park, So Young ; Kim, Sang Bum ; Rim, Hae-Chang. / Simple weighting techniques for query expansion in biomedical document retrieval. In: IEICE Transactions on Information and Systems. 2007 ; Vol. E90-D, No. 11. pp. 1873-1876.
@article{6598425bc0354094bc7f1893d262d966,
title = "Simple weighting techniques for query expansion in biomedical document retrieval",
abstract = "In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.",
keywords = "Biomedical document retrieval, Biomedical terminology, Biomedical terminology weighting, Query expansion",
author = "Song, {Young In} and Han, {Kyoung Soo} and Park, {So Young} and Kim, {Sang Bum} and Hae-Chang Rim",
year = "2007",
month = "11",
day = "1",
doi = "10.1093/ietisy/e90-d.11.1873",
language = "English",
volume = "E90-D",
pages = "1873--1876",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "11",

}

TY - JOUR

T1 - Simple weighting techniques for query expansion in biomedical document retrieval

AU - Song, Young In

AU - Han, Kyoung Soo

AU - Park, So Young

AU - Kim, Sang Bum

AU - Rim, Hae-Chang

PY - 2007/11/1

Y1 - 2007/11/1

N2 - In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.

AB - In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.

KW - Biomedical document retrieval

KW - Biomedical terminology

KW - Biomedical terminology weighting

KW - Query expansion

UR - http://www.scopus.com/inward/record.url?scp=68249158381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68249158381&partnerID=8YFLogxK

U2 - 10.1093/ietisy/e90-d.11.1873

DO - 10.1093/ietisy/e90-d.11.1873

M3 - Article

AN - SCOPUS:68249158381

VL - E90-D

SP - 1873

EP - 1876

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 11

ER -