Improving grouped-entity resolution using Quasi-Cliques

Byung Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Citations (Scopus)

Abstract

The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.

Original languageEnglish
Title of host publicationProceedings - Sixth International Conference on Data Mining, ICDM 2006
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1008-1015
Number of pages8
ISBN (Print)0769527019, 9780769527017
DOIs
Publication statusPublished - 2006 Jan 1
Event6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China
Duration: 2006 Dec 182006 Dec 22

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference6th International Conference on Data Mining, ICDM 2006
CountryChina
CityHong Kong
Period06/12/1806/12/22

Fingerprint

Syntactics
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

On, B. W., Elmacioglu, E., Lee, D., Kang, J., & Pei, J. (2006). Improving grouped-entity resolution using Quasi-Cliques. In Proceedings - Sixth International Conference on Data Mining, ICDM 2006 (pp. 1008-1015). [4053144] (Proceedings - IEEE International Conference on Data Mining, ICDM). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2006.85

Improving grouped-entity resolution using Quasi-Cliques. / On, Byung Won; Elmacioglu, Ergin; Lee, Dongwon; Kang, Jaewoo; Pei, Jian.

Proceedings - Sixth International Conference on Data Mining, ICDM 2006. Institute of Electrical and Electronics Engineers Inc., 2006. p. 1008-1015 4053144 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

On, BW, Elmacioglu, E, Lee, D, Kang, J & Pei, J 2006, Improving grouped-entity resolution using Quasi-Cliques. in Proceedings - Sixth International Conference on Data Mining, ICDM 2006., 4053144, Proceedings - IEEE International Conference on Data Mining, ICDM, Institute of Electrical and Electronics Engineers Inc., pp. 1008-1015, 6th International Conference on Data Mining, ICDM 2006, Hong Kong, China, 06/12/18. https://doi.org/10.1109/ICDM.2006.85
On BW, Elmacioglu E, Lee D, Kang J, Pei J. Improving grouped-entity resolution using Quasi-Cliques. In Proceedings - Sixth International Conference on Data Mining, ICDM 2006. Institute of Electrical and Electronics Engineers Inc. 2006. p. 1008-1015. 4053144. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2006.85
On, Byung Won ; Elmacioglu, Ergin ; Lee, Dongwon ; Kang, Jaewoo ; Pei, Jian. / Improving grouped-entity resolution using Quasi-Cliques. Proceedings - Sixth International Conference on Data Mining, ICDM 2006. Institute of Electrical and Electronics Engineers Inc., 2006. pp. 1008-1015 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{21e4484fe2d04fc7b46e909740a09bd8,
title = "Improving grouped-entity resolution using Quasi-Cliques",
abstract = "The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83{\%} when used together with a variety of existing ER solutions, but never worsens them.",
author = "On, {Byung Won} and Ergin Elmacioglu and Dongwon Lee and Jaewoo Kang and Jian Pei",
year = "2006",
month = "1",
day = "1",
doi = "10.1109/ICDM.2006.85",
language = "English",
isbn = "0769527019",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1008--1015",
booktitle = "Proceedings - Sixth International Conference on Data Mining, ICDM 2006",

}

TY - GEN

T1 - Improving grouped-entity resolution using Quasi-Cliques

AU - On, Byung Won

AU - Elmacioglu, Ergin

AU - Lee, Dongwon

AU - Kang, Jaewoo

AU - Pei, Jian

PY - 2006/1/1

Y1 - 2006/1/1

N2 - The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.

AB - The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.

UR - http://www.scopus.com/inward/record.url?scp=47249101877&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47249101877&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2006.85

DO - 10.1109/ICDM.2006.85

M3 - Conference contribution

AN - SCOPUS:47249101877

SN - 0769527019

SN - 9780769527017

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 1008

EP - 1015

BT - Proceedings - Sixth International Conference on Data Mining, ICDM 2006

PB - Institute of Electrical and Electronics Engineers Inc.

ER -