SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data

Hyunsik Choi, Jihoon Son, Yonghyun Cho, Min Kyoung Sung, Yon Dohn Chung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Citations (Scopus)

Abstract

RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages2087-2088
Number of pages2
DOIs
Publication statusPublished - 2009 Dec 1
EventACM 18th International Conference on Information and Knowledge Management, CIKM 2009 - Hong Kong, China
Duration: 2009 Nov 22009 Nov 6

Other

OtherACM 18th International Conference on Information and Knowledge Management, CIKM 2009
CountryChina
CityHong Kong
Period09/11/209/11/6

Fingerprint

Query
Evaluation
Graph
Hadoop
Module
Semantic web
Node
Distributed computing
Bioinformatics
Query language
MapReduce
Directed graph
Prototype
Social networks
Query processing

Keywords

  • Distributed
  • RDF
  • Semantic web
  • Triple store

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Choi, H., Son, J., Cho, Y., Sung, M. K., & Chung, Y. D. (2009). SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data. In International Conference on Information and Knowledge Management, Proceedings (pp. 2087-2088) https://doi.org/10.1145/1645953.1646315

SPIDER : A system for scalable, parallel/distributed evaluation of large-scale RDF data. / Choi, Hyunsik; Son, Jihoon; Cho, Yonghyun; Sung, Min Kyoung; Chung, Yon Dohn.

International Conference on Information and Knowledge Management, Proceedings. 2009. p. 2087-2088.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Choi, H, Son, J, Cho, Y, Sung, MK & Chung, YD 2009, SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data. in International Conference on Information and Knowledge Management, Proceedings. pp. 2087-2088, ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, 09/11/2. https://doi.org/10.1145/1645953.1646315
Choi H, Son J, Cho Y, Sung MK, Chung YD. SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data. In International Conference on Information and Knowledge Management, Proceedings. 2009. p. 2087-2088 https://doi.org/10.1145/1645953.1646315
Choi, Hyunsik ; Son, Jihoon ; Cho, Yonghyun ; Sung, Min Kyoung ; Chung, Yon Dohn. / SPIDER : A system for scalable, parallel/distributed evaluation of large-scale RDF data. International Conference on Information and Knowledge Management, Proceedings. 2009. pp. 2087-2088
@inproceedings{e650ceb014034ca08d4f91d03674c659,
title = "SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data",
abstract = "RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.",
keywords = "Distributed, RDF, Semantic web, Triple store",
author = "Hyunsik Choi and Jihoon Son and Yonghyun Cho and Sung, {Min Kyoung} and Chung, {Yon Dohn}",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1645953.1646315",
language = "English",
isbn = "9781605585123",
pages = "2087--2088",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - SPIDER

T2 - A system for scalable, parallel/distributed evaluation of large-scale RDF data

AU - Choi, Hyunsik

AU - Son, Jihoon

AU - Cho, Yonghyun

AU - Sung, Min Kyoung

AU - Chung, Yon Dohn

PY - 2009/12/1

Y1 - 2009/12/1

N2 - RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.

AB - RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.

KW - Distributed

KW - RDF

KW - Semantic web

KW - Triple store

UR - http://www.scopus.com/inward/record.url?scp=74549174073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74549174073&partnerID=8YFLogxK

U2 - 10.1145/1645953.1646315

DO - 10.1145/1645953.1646315

M3 - Conference contribution

AN - SCOPUS:74549174073

SN - 9781605585123

SP - 2087

EP - 2088

BT - International Conference on Information and Knowledge Management, Proceedings

ER -