With increasing demand of data-intensive applications, mapreduce technologies have become useful tools to develop large scale applications efficiently by integrating various existing mapreduce jobs. However, there are few existing researches of workflow systems which can integrates mapreduce jobs with on-demand cloud resource provisioning. In this paper, we present a new cloud-based mapreduce workflow execution platform named DIVE-CWM (Distributed-parallel Virtual Environment on Cloud computing for Workflow for launching Mapreduce jobs) which integrates multiple mapreduce jobs and legacy programs into a single workflow. It provides a transparent and selective job scheduling by estimating the execution time in advance for workflow to execute all its jobs. Also, it supports automatic resource provisioning scheme which offers on-demand VM resources automatically to launch a workflow onto cloud. Furthermore, it provides an agent based resource management for automatic job deployment and execution of workflow on mapreduce clusters. Additionally, service oriented architecture based on web service API and graphical user interface offers high accessibility and convenience to user and other systems. We show the experimental results which compares the different scheduling schemes for various workflows.
ASJC Scopus subject areas
- Computer Networks and Communications