Comparative study of name disambiguation problem using a scalable blocking-based framework

Byung Won On, Jaewoo Kang, Dongwon Lee, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

82 Citations (Scopus)

Abstract

In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.

Original languageEnglish
Title of host publicationProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Pages344-353
Number of pages10
Publication statusPublished - 2005
Externally publishedYes
Event5th ACM/IEEE Joint Conference on Digital Libraries - Digital Libraries: Cyberinfrastructure for Research and Education - Denver, CO, United States
Duration: 2005 Jun 72005 Jun 11

Other

Other5th ACM/IEEE Joint Conference on Digital Libraries - Digital Libraries: Cyberinfrastructure for Research and Education
CountryUnited States
CityDenver, CO
Period05/6/705/6/11

Keywords

  • Blocking
  • Measuring Distances
  • Name Disambiguation

ASJC Scopus subject areas

  • Engineering(all)

Cite this

On, B. W., Kang, J., Lee, D., & Mitra, P. (2005). Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (pp. 344-353)

Comparative study of name disambiguation problem using a scalable blocking-based framework. / On, Byung Won; Kang, Jaewoo; Lee, Dongwon; Mitra, Prasenjit.

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. 2005. p. 344-353.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

On, BW, Kang, J, Lee, D & Mitra, P 2005, Comparative study of name disambiguation problem using a scalable blocking-based framework. in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. pp. 344-353, 5th ACM/IEEE Joint Conference on Digital Libraries - Digital Libraries: Cyberinfrastructure for Research and Education, Denver, CO, United States, 05/6/7.
On BW, Kang J, Lee D, Mitra P. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. 2005. p. 344-353
On, Byung Won ; Kang, Jaewoo ; Lee, Dongwon ; Mitra, Prasenjit. / Comparative study of name disambiguation problem using a scalable blocking-based framework. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. 2005. pp. 344-353
@inproceedings{c5d9b94b01db47208cfd2cb924c8273e,
title = "Comparative study of name disambiguation problem using a scalable blocking-based framework",
abstract = "In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., {"}Vannevar Bush{"} and {"}V. Vush{"}). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.",
keywords = "Blocking, Measuring Distances, Name Disambiguation",
author = "On, {Byung Won} and Jaewoo Kang and Dongwon Lee and Prasenjit Mitra",
year = "2005",
language = "English",
pages = "344--353",
booktitle = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",

}

TY - GEN

T1 - Comparative study of name disambiguation problem using a scalable blocking-based framework

AU - On, Byung Won

AU - Kang, Jaewoo

AU - Lee, Dongwon

AU - Mitra, Prasenjit

PY - 2005

Y1 - 2005

N2 - In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.

AB - In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.

KW - Blocking

KW - Measuring Distances

KW - Name Disambiguation

UR - http://www.scopus.com/inward/record.url?scp=27544460727&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544460727&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:27544460727

SP - 344

EP - 353

BT - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

ER -