DQN-based OpenCL workload partition for performance optimization

Sanghyun Park, Taeweon Suh

Research output: Contribution to journalArticle

Abstract

This paper proposes a deep Q network (DQN)-based method for the workload partition problem in OpenCL. The DQN, a reinforcement learning algorithm, optimizes the workload partition for each processing unit by the self-training, based on the accumulated performance data on the computing environment. Our experiments reveal that the DQN-based partition provides the performance improvement by up to 62.2% and 6.9% in JPEG decoding, compared to the LuxMark-based and target-based partitions, respectively. The DQN is able to capture the low-level contention in slave devices such as caches and memory, and the communication bottleneck between devices, and reflect it to the workload partition ratio.

Original languageEnglish
JournalJournal of Supercomputing
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Performance Optimization
Reinforcement learning
Learning algorithms
Workload
Decoding
Partition
Data storage equipment
Communication
Processing
Experiments
Contention
Reinforcement Learning
Cache
Learning Algorithm
Optimise
Target
Unit
Computing
Experiment

Keywords

  • DQN
  • OpenCL
  • Workload partition

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture

Cite this

DQN-based OpenCL workload partition for performance optimization. / Park, Sanghyun; Suh, Taeweon.

In: Journal of Supercomputing, 01.01.2019.

Research output: Contribution to journalArticle

@article{1ea5172c34364896a1af0ce29d90bef0,
title = "DQN-based OpenCL workload partition for performance optimization",
abstract = "This paper proposes a deep Q network (DQN)-based method for the workload partition problem in OpenCL. The DQN, a reinforcement learning algorithm, optimizes the workload partition for each processing unit by the self-training, based on the accumulated performance data on the computing environment. Our experiments reveal that the DQN-based partition provides the performance improvement by up to 62.2{\%} and 6.9{\%} in JPEG decoding, compared to the LuxMark-based and target-based partitions, respectively. The DQN is able to capture the low-level contention in slave devices such as caches and memory, and the communication bottleneck between devices, and reflect it to the workload partition ratio.",
keywords = "DQN, OpenCL, Workload partition",
author = "Sanghyun Park and Taeweon Suh",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s11227-019-02766-0",
language = "English",
journal = "The Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - DQN-based OpenCL workload partition for performance optimization

AU - Park, Sanghyun

AU - Suh, Taeweon

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper proposes a deep Q network (DQN)-based method for the workload partition problem in OpenCL. The DQN, a reinforcement learning algorithm, optimizes the workload partition for each processing unit by the self-training, based on the accumulated performance data on the computing environment. Our experiments reveal that the DQN-based partition provides the performance improvement by up to 62.2% and 6.9% in JPEG decoding, compared to the LuxMark-based and target-based partitions, respectively. The DQN is able to capture the low-level contention in slave devices such as caches and memory, and the communication bottleneck between devices, and reflect it to the workload partition ratio.

AB - This paper proposes a deep Q network (DQN)-based method for the workload partition problem in OpenCL. The DQN, a reinforcement learning algorithm, optimizes the workload partition for each processing unit by the self-training, based on the accumulated performance data on the computing environment. Our experiments reveal that the DQN-based partition provides the performance improvement by up to 62.2% and 6.9% in JPEG decoding, compared to the LuxMark-based and target-based partitions, respectively. The DQN is able to capture the low-level contention in slave devices such as caches and memory, and the communication bottleneck between devices, and reflect it to the workload partition ratio.

KW - DQN

KW - OpenCL

KW - Workload partition

UR - http://www.scopus.com/inward/record.url?scp=85061327493&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061327493&partnerID=8YFLogxK

U2 - 10.1007/s11227-019-02766-0

DO - 10.1007/s11227-019-02766-0

M3 - Article

AN - SCOPUS:85061327493

JO - The Journal of Supercomputing

JF - The Journal of Supercomputing

SN - 0920-8542

ER -