Hierarchically organized skew-tolerant histograms for geographic data objects

Yohan J. Roh, Jae H. Kim, Yon Dohn Chung, Jin Hyun Son, Myoung H. Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. Our method detects hotspots present in various parts of a data set and exploits them in organizing histogram buckets. For this purpose, we first define the concept of a hotspot, and provide an algorithm that efficiently extracts hotspots from the given data set. Then, we present our histogram construction method that utilizes hotspot information. We also describe how to estimate query result sizes by using the proposed histogram. We show through extensive performance experiments that the proposed method provides better performance than other existing methods.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
Pages627-638
Number of pages12
DOIs
Publication statusPublished - 2010 Jul 23
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: 2010 Jun 62010 Jun 11

Other

Other2010 International Conference on Management of Data, SIGMOD '10
CountryUnited States
CityIndianapolis, IN
Period10/6/610/6/11

Fingerprint

Experiments

Keywords

  • histograms
  • query optimization
  • spatial databases

ASJC Scopus subject areas

  • Information Systems
  • Software

Cite this

Roh, Y. J., Kim, J. H., Chung, Y. D., Son, J. H., & Kim, M. H. (2010). Hierarchically organized skew-tolerant histograms for geographic data objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 627-638) https://doi.org/10.1145/1807167.1807236

Hierarchically organized skew-tolerant histograms for geographic data objects. / Roh, Yohan J.; Kim, Jae H.; Chung, Yon Dohn; Son, Jin Hyun; Kim, Myoung H.

Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. p. 627-638.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Roh, YJ, Kim, JH, Chung, YD, Son, JH & Kim, MH 2010, Hierarchically organized skew-tolerant histograms for geographic data objects. in Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 627-638, 2010 International Conference on Management of Data, SIGMOD '10, Indianapolis, IN, United States, 10/6/6. https://doi.org/10.1145/1807167.1807236
Roh YJ, Kim JH, Chung YD, Son JH, Kim MH. Hierarchically organized skew-tolerant histograms for geographic data objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. p. 627-638 https://doi.org/10.1145/1807167.1807236
Roh, Yohan J. ; Kim, Jae H. ; Chung, Yon Dohn ; Son, Jin Hyun ; Kim, Myoung H. / Hierarchically organized skew-tolerant histograms for geographic data objects. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. pp. 627-638
@inproceedings{055bd55520ef46f6819fb6370f4938de,
title = "Hierarchically organized skew-tolerant histograms for geographic data objects",
abstract = "Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. Our method detects hotspots present in various parts of a data set and exploits them in organizing histogram buckets. For this purpose, we first define the concept of a hotspot, and provide an algorithm that efficiently extracts hotspots from the given data set. Then, we present our histogram construction method that utilizes hotspot information. We also describe how to estimate query result sizes by using the proposed histogram. We show through extensive performance experiments that the proposed method provides better performance than other existing methods.",
keywords = "histograms, query optimization, spatial databases",
author = "Roh, {Yohan J.} and Kim, {Jae H.} and Chung, {Yon Dohn} and Son, {Jin Hyun} and Kim, {Myoung H.}",
year = "2010",
month = "7",
day = "23",
doi = "10.1145/1807167.1807236",
language = "English",
isbn = "9781450300322",
pages = "627--638",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Hierarchically organized skew-tolerant histograms for geographic data objects

AU - Roh, Yohan J.

AU - Kim, Jae H.

AU - Chung, Yon Dohn

AU - Son, Jin Hyun

AU - Kim, Myoung H.

PY - 2010/7/23

Y1 - 2010/7/23

N2 - Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. Our method detects hotspots present in various parts of a data set and exploits them in organizing histogram buckets. For this purpose, we first define the concept of a hotspot, and provide an algorithm that efficiently extracts hotspots from the given data set. Then, we present our histogram construction method that utilizes hotspot information. We also describe how to estimate query result sizes by using the proposed histogram. We show through extensive performance experiments that the proposed method provides better performance than other existing methods.

AB - Histograms have been widely used for fast estimation of query result sizes in query optimization. In this paper, we propose a new histogram method, called the Skew-Tolerant Histogram (STHistogram) for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. Our method detects hotspots present in various parts of a data set and exploits them in organizing histogram buckets. For this purpose, we first define the concept of a hotspot, and provide an algorithm that efficiently extracts hotspots from the given data set. Then, we present our histogram construction method that utilizes hotspot information. We also describe how to estimate query result sizes by using the proposed histogram. We show through extensive performance experiments that the proposed method provides better performance than other existing methods.

KW - histograms

KW - query optimization

KW - spatial databases

UR - http://www.scopus.com/inward/record.url?scp=77954701875&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954701875&partnerID=8YFLogxK

U2 - 10.1145/1807167.1807236

DO - 10.1145/1807167.1807236

M3 - Conference contribution

AN - SCOPUS:77954701875

SN - 9781450300322

SP - 627

EP - 638

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

ER -