Performance impact of JobTracker failure in Hadoop

Young Pil Kim, Cheol Ho Hong, Hyuck Yoo

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

In this paper, we analyze the performance impact of JobTracker failure in Hadoop. A JobTracker failure is a serious problem that affects the overall job processing performance. We describe the cause of failure and the system behaviors because of failed job processing in the Hadoop. On the basis of the analysis, we build a job completion time model that reflects failure effects. Our model is based on a stochastic process with a node crash probability. With our model, we run simulation of performance impact with very credible failure data available from USENIX called computer failure data repository that have been collected for past 9-years. The results show that the performance impact is very severe in that the job completion time increases about four times typically, and in a worst case, it increases up to 68 times.

Original languageEnglish
Pages (from-to)1265-1281
Number of pages17
JournalInternational Journal of Communication Systems
Volume28
Issue number7
DOIs
Publication statusPublished - 2015 May 10

Fingerprint

Computer system recovery
Processing
Random processes

Keywords

  • failure analysis
  • Hadoop
  • JobTracker
  • large-scale data processing

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications

Cite this

Performance impact of JobTracker failure in Hadoop. / Kim, Young Pil; Hong, Cheol Ho; Yoo, Hyuck.

In: International Journal of Communication Systems, Vol. 28, No. 7, 10.05.2015, p. 1265-1281.

Research output: Contribution to journalArticle

Kim, Young Pil ; Hong, Cheol Ho ; Yoo, Hyuck. / Performance impact of JobTracker failure in Hadoop. In: International Journal of Communication Systems. 2015 ; Vol. 28, No. 7. pp. 1265-1281.
@article{870d8fce815641dca6ae99d8db3e6b98,
title = "Performance impact of JobTracker failure in Hadoop",
abstract = "In this paper, we analyze the performance impact of JobTracker failure in Hadoop. A JobTracker failure is a serious problem that affects the overall job processing performance. We describe the cause of failure and the system behaviors because of failed job processing in the Hadoop. On the basis of the analysis, we build a job completion time model that reflects failure effects. Our model is based on a stochastic process with a node crash probability. With our model, we run simulation of performance impact with very credible failure data available from USENIX called computer failure data repository that have been collected for past 9-years. The results show that the performance impact is very severe in that the job completion time increases about four times typically, and in a worst case, it increases up to 68 times.",
keywords = "failure analysis, Hadoop, JobTracker, large-scale data processing",
author = "Kim, {Young Pil} and Hong, {Cheol Ho} and Hyuck Yoo",
year = "2015",
month = "5",
day = "10",
doi = "10.1002/dac.2759",
language = "English",
volume = "28",
pages = "1265--1281",
journal = "International Journal of Communication Systems",
issn = "1074-5351",
publisher = "John Wiley and Sons Ltd",
number = "7",

}

TY - JOUR

T1 - Performance impact of JobTracker failure in Hadoop

AU - Kim, Young Pil

AU - Hong, Cheol Ho

AU - Yoo, Hyuck

PY - 2015/5/10

Y1 - 2015/5/10

N2 - In this paper, we analyze the performance impact of JobTracker failure in Hadoop. A JobTracker failure is a serious problem that affects the overall job processing performance. We describe the cause of failure and the system behaviors because of failed job processing in the Hadoop. On the basis of the analysis, we build a job completion time model that reflects failure effects. Our model is based on a stochastic process with a node crash probability. With our model, we run simulation of performance impact with very credible failure data available from USENIX called computer failure data repository that have been collected for past 9-years. The results show that the performance impact is very severe in that the job completion time increases about four times typically, and in a worst case, it increases up to 68 times.

AB - In this paper, we analyze the performance impact of JobTracker failure in Hadoop. A JobTracker failure is a serious problem that affects the overall job processing performance. We describe the cause of failure and the system behaviors because of failed job processing in the Hadoop. On the basis of the analysis, we build a job completion time model that reflects failure effects. Our model is based on a stochastic process with a node crash probability. With our model, we run simulation of performance impact with very credible failure data available from USENIX called computer failure data repository that have been collected for past 9-years. The results show that the performance impact is very severe in that the job completion time increases about four times typically, and in a worst case, it increases up to 68 times.

KW - failure analysis

KW - Hadoop

KW - JobTracker

KW - large-scale data processing

UR - http://www.scopus.com/inward/record.url?scp=84925345925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925345925&partnerID=8YFLogxK

U2 - 10.1002/dac.2759

DO - 10.1002/dac.2759

M3 - Article

AN - SCOPUS:84925345925

VL - 28

SP - 1265

EP - 1281

JO - International Journal of Communication Systems

JF - International Journal of Communication Systems

SN - 1074-5351

IS - 7

ER -