Identifying value mappings for data integration: An unsupervised approach

Jaewoo Kang, Dongwon Lee, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. "Two-door front wheel drive" can be represented as "2DR-FWD" or "R2FD", or even as "CAR TYPE 3" in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.

Original languageEnglish
Title of host publicationWeb Information Systems Engineering, WISE 2005 - 6th International Conference on Web Information Systems Engineering, Proceedings
Pages544-551
Number of pages8
DOIs
Publication statusPublished - 2005
Event6th International Conference on Web Information Systems Engineering, WISE 2005 - New York, NY, United States
Duration: 2005 Nov 202005 Nov 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3806 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Conference on Web Information Systems Engineering, WISE 2005
CountryUnited States
CityNew York, NY
Period05/11/2005/11/22

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Identifying value mappings for data integration: An unsupervised approach'. Together they form a unique fingerprint.

  • Cite this

    Kang, J., Lee, D., & Mitra, P. (2005). Identifying value mappings for data integration: An unsupervised approach. In Web Information Systems Engineering, WISE 2005 - 6th International Conference on Web Information Systems Engineering, Proceedings (pp. 544-551). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3806 LNCS). https://doi.org/10.1007/11581062_46