Incremental data integration based on hierarchical metadata registry with data visibility

Dongwon Jeong, Doo Kwon Baik

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

A considerable number of researches have been studied on data integration based on metadata. However, existing approaches require too much cost to build an initial guideline. Most important reason is that the previous researches have not seriously considered the corresponding domain properties such as the data level and the user level. First, it is difficult in practice to create a standardized guideline on the entire data set, if there is a restricted cost given. Thus, a set of data to be integrated should be selected first. However, most databases have no statistical information that may be used to select such a set of data according to its usability. In this paper, we propose LOG (localization-based global metadata registry) methodology to build a guideline and integrate databases progressively considering the domain properties. The key idea is that the priorities of databases to be integrated are determined by the relationship to the domain properties. We also show the implementation by applying it to actual databases in Korea Institute of Science and Technology Information, which builds and manages a considerable number of databases on the science and technology in Korea. The LOG provides an incremental build method of metadata registry, and also supports progressive data integration mechanism on the existing distributed databases. It especially gives successful and efficient output on the creation of a standard guideline in the situation where the given cost is restricted.

Original languageEnglish
Pages (from-to)147-181
Number of pages35
JournalInformation Sciences
Volume162
Issue number3-4
DOIs
Publication statusPublished - 2004 Jun 4

Fingerprint

Data integration
Data Integration
Metadata
Visibility
Costs
Distributed Databases
Information Technology
Information science
Usability
Information technology
Integrate
Registry
Incremental
Data base
Entire
Methodology
Output
Korea
Integrated

Keywords

  • Data visibility
  • Hierarchical MDR
  • Incremental data integration
  • MDR
  • Metadata registry

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Statistics and Probability

Cite this

Incremental data integration based on hierarchical metadata registry with data visibility. / Jeong, Dongwon; Baik, Doo Kwon.

In: Information Sciences, Vol. 162, No. 3-4, 04.06.2004, p. 147-181.

Research output: Contribution to journalArticle

@article{47cccec8d5bd4d7da48f207acdae6449,
title = "Incremental data integration based on hierarchical metadata registry with data visibility",
abstract = "A considerable number of researches have been studied on data integration based on metadata. However, existing approaches require too much cost to build an initial guideline. Most important reason is that the previous researches have not seriously considered the corresponding domain properties such as the data level and the user level. First, it is difficult in practice to create a standardized guideline on the entire data set, if there is a restricted cost given. Thus, a set of data to be integrated should be selected first. However, most databases have no statistical information that may be used to select such a set of data according to its usability. In this paper, we propose LOG (localization-based global metadata registry) methodology to build a guideline and integrate databases progressively considering the domain properties. The key idea is that the priorities of databases to be integrated are determined by the relationship to the domain properties. We also show the implementation by applying it to actual databases in Korea Institute of Science and Technology Information, which builds and manages a considerable number of databases on the science and technology in Korea. The LOG provides an incremental build method of metadata registry, and also supports progressive data integration mechanism on the existing distributed databases. It especially gives successful and efficient output on the creation of a standard guideline in the situation where the given cost is restricted.",
keywords = "Data visibility, Hierarchical MDR, Incremental data integration, MDR, Metadata registry",
author = "Dongwon Jeong and Baik, {Doo Kwon}",
year = "2004",
month = "6",
day = "4",
doi = "10.1016/j.ins.2003.09.008",
language = "English",
volume = "162",
pages = "147--181",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "3-4",

}

TY - JOUR

T1 - Incremental data integration based on hierarchical metadata registry with data visibility

AU - Jeong, Dongwon

AU - Baik, Doo Kwon

PY - 2004/6/4

Y1 - 2004/6/4

N2 - A considerable number of researches have been studied on data integration based on metadata. However, existing approaches require too much cost to build an initial guideline. Most important reason is that the previous researches have not seriously considered the corresponding domain properties such as the data level and the user level. First, it is difficult in practice to create a standardized guideline on the entire data set, if there is a restricted cost given. Thus, a set of data to be integrated should be selected first. However, most databases have no statistical information that may be used to select such a set of data according to its usability. In this paper, we propose LOG (localization-based global metadata registry) methodology to build a guideline and integrate databases progressively considering the domain properties. The key idea is that the priorities of databases to be integrated are determined by the relationship to the domain properties. We also show the implementation by applying it to actual databases in Korea Institute of Science and Technology Information, which builds and manages a considerable number of databases on the science and technology in Korea. The LOG provides an incremental build method of metadata registry, and also supports progressive data integration mechanism on the existing distributed databases. It especially gives successful and efficient output on the creation of a standard guideline in the situation where the given cost is restricted.

AB - A considerable number of researches have been studied on data integration based on metadata. However, existing approaches require too much cost to build an initial guideline. Most important reason is that the previous researches have not seriously considered the corresponding domain properties such as the data level and the user level. First, it is difficult in practice to create a standardized guideline on the entire data set, if there is a restricted cost given. Thus, a set of data to be integrated should be selected first. However, most databases have no statistical information that may be used to select such a set of data according to its usability. In this paper, we propose LOG (localization-based global metadata registry) methodology to build a guideline and integrate databases progressively considering the domain properties. The key idea is that the priorities of databases to be integrated are determined by the relationship to the domain properties. We also show the implementation by applying it to actual databases in Korea Institute of Science and Technology Information, which builds and manages a considerable number of databases on the science and technology in Korea. The LOG provides an incremental build method of metadata registry, and also supports progressive data integration mechanism on the existing distributed databases. It especially gives successful and efficient output on the creation of a standard guideline in the situation where the given cost is restricted.

KW - Data visibility

KW - Hierarchical MDR

KW - Incremental data integration

KW - MDR

KW - Metadata registry

UR - http://www.scopus.com/inward/record.url?scp=2442639364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442639364&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2003.09.008

DO - 10.1016/j.ins.2003.09.008

M3 - Article

VL - 162

SP - 147

EP - 181

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 3-4

ER -