Schema matching using interattribute dependencies

Jaewoo Kang, Jeffrey F. Naughton

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schémas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.

Original languageEnglish
Article number4527243
Pages (from-to)1393-1407
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume20
Issue number10
DOIs
Publication statusPublished - 2008 Oct 1

Fingerprint

Approximation algorithms
Computational complexity
Personnel

Keywords

  • Attribute dependency
  • Graph matching
  • Schema matching

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Schema matching using interattribute dependencies. / Kang, Jaewoo; Naughton, Jeffrey F.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 10, 4527243, 01.10.2008, p. 1393-1407.

Research output: Contribution to journalArticle

@article{07456ccfad2c47e789edc8aa8d5391b6,
title = "Schema matching using interattribute dependencies",
abstract = "Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the sch{\'e}mas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.",
keywords = "Attribute dependency, Graph matching, Schema matching",
author = "Jaewoo Kang and Naughton, {Jeffrey F.}",
year = "2008",
month = "10",
day = "1",
doi = "10.1109/TKDE.2008.100",
language = "English",
volume = "20",
pages = "1393--1407",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "10",

}

TY - JOUR

T1 - Schema matching using interattribute dependencies

AU - Kang, Jaewoo

AU - Naughton, Jeffrey F.

PY - 2008/10/1

Y1 - 2008/10/1

N2 - Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schémas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.

AB - Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema-matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schémas and the data in the columns are opaque or very difficult to interpret. In our previous work, we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph-matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph-matching problem in the second step. The problem instance we are facing is the weighted graph-matching problem to which no efficient solution has yet been found. In this paper, we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.

KW - Attribute dependency

KW - Graph matching

KW - Schema matching

UR - http://www.scopus.com/inward/record.url?scp=50649089608&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50649089608&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2008.100

DO - 10.1109/TKDE.2008.100

M3 - Article

VL - 20

SP - 1393

EP - 1407

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 10

M1 - 4527243

ER -