Skew-tolerant key distribution for load balancing in mapreduce

Jihoon Son, Hyunsik Choi, Yon Dohn Chung

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.

Original languageEnglish
Pages (from-to)677-680
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE95-D
Issue number2
DOIs
Publication statusPublished - 2012 Feb 1

Fingerprint

Resource allocation
Processing
Experiments

Keywords

  • Key distribution
  • Load balance
  • MapReduce
  • Skew-tolerance

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Skew-tolerant key distribution for load balancing in mapreduce. / Son, Jihoon; Choi, Hyunsik; Chung, Yon Dohn.

In: IEICE Transactions on Information and Systems, Vol. E95-D, No. 2, 01.02.2012, p. 677-680.

Research output: Contribution to journalArticle

@article{0e101a08840841af9dae378853310e9d,
title = "Skew-tolerant key distribution for load balancing in mapreduce",
abstract = "MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.",
keywords = "Key distribution, Load balance, MapReduce, Skew-tolerance",
author = "Jihoon Son and Hyunsik Choi and Chung, {Yon Dohn}",
year = "2012",
month = "2",
day = "1",
doi = "10.1587/transinf.E95.D.677",
language = "English",
volume = "E95-D",
pages = "677--680",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "2",

}

TY - JOUR

T1 - Skew-tolerant key distribution for load balancing in mapreduce

AU - Son, Jihoon

AU - Choi, Hyunsik

AU - Chung, Yon Dohn

PY - 2012/2/1

Y1 - 2012/2/1

N2 - MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.

AB - MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.

KW - Key distribution

KW - Load balance

KW - MapReduce

KW - Skew-tolerance

UR - http://www.scopus.com/inward/record.url?scp=84856389487&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856389487&partnerID=8YFLogxK

U2 - 10.1587/transinf.E95.D.677

DO - 10.1587/transinf.E95.D.677

M3 - Article

VL - E95-D

SP - 677

EP - 680

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 2

ER -