Sentence-based relevance flow analysis for high accuracy retrieval

Jung Tae Lee, Jangwon Seo, Jiwoon Jeon, Hae-Chang Rim

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.

Original languageEnglish
Pages (from-to)1666-1675
Number of pages10
JournalJournal of the American Society for Information Science and Technology
Volume62
Issue number9
DOIs
Publication statusPublished - 2011 Sep 1

Fingerprint

Blogs
Information retrieval
ranking
Experiments
weblog
information retrieval
Query
Ranking
news
lack
experiment
ability
performance
Experiment
Leverage
Sequential patterns
Test collections
News

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Information Systems
  • Human-Computer Interaction
  • Computer Networks and Communications

Cite this

Sentence-based relevance flow analysis for high accuracy retrieval. / Lee, Jung Tae; Seo, Jangwon; Jeon, Jiwoon; Rim, Hae-Chang.

In: Journal of the American Society for Information Science and Technology, Vol. 62, No. 9, 01.09.2011, p. 1666-1675.

Research output: Contribution to journalArticle

@article{8d20fe08a4084348b0fa44d8f94eca41,
title = "Sentence-based relevance flow analysis for high accuracy retrieval",
abstract = "Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.",
author = "Lee, {Jung Tae} and Jangwon Seo and Jiwoon Jeon and Hae-Chang Rim",
year = "2011",
month = "9",
day = "1",
doi = "10.1002/asi.21564",
language = "English",
volume = "62",
pages = "1666--1675",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "9",

}

TY - JOUR

T1 - Sentence-based relevance flow analysis for high accuracy retrieval

AU - Lee, Jung Tae

AU - Seo, Jangwon

AU - Jeon, Jiwoon

AU - Rim, Hae-Chang

PY - 2011/9/1

Y1 - 2011/9/1

N2 - Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.

AB - Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.

UR - http://www.scopus.com/inward/record.url?scp=80051668901&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051668901&partnerID=8YFLogxK

U2 - 10.1002/asi.21564

DO - 10.1002/asi.21564

M3 - Article

AN - SCOPUS:80051668901

VL - 62

SP - 1666

EP - 1675

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 9

ER -