Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline

Jaegeun Oh, Seok Joong Hwang, Huong Giang Nguyen, Areum Kim, Seon Wook Kim, Chulwoo Kim, Jong-Kook Kim

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SEMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SEMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 135% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

Original languageEnglish
Pages (from-to)576-586
Number of pages11
JournalETRI Journal
Volume30
Issue number4
Publication statusPublished - 2008 Aug 1

Fingerprint

Pipelines
Hardware
Field programmable gate arrays (FPGA)
Data storage equipment
Degradation
Code generation

Keywords

  • CMP
  • ILP
  • MLER
  • SMT
  • TLP

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications

Cite this

Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline. / Oh, Jaegeun; Hwang, Seok Joong; Nguyen, Huong Giang; Kim, Areum; Kim, Seon Wook; Kim, Chulwoo; Kim, Jong-Kook.

In: ETRI Journal, Vol. 30, No. 4, 01.08.2008, p. 576-586.

Research output: Contribution to journalArticle

Oh, Jaegeun ; Hwang, Seok Joong ; Nguyen, Huong Giang ; Kim, Areum ; Kim, Seon Wook ; Kim, Chulwoo ; Kim, Jong-Kook. / Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline. In: ETRI Journal. 2008 ; Vol. 30, No. 4. pp. 576-586.
@article{ee2e5bb7dc9b4cf0a8dc581a310dceb9,
title = "Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline",
abstract = "In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SEMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SEMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 135{\%} faster with a 2-way MLEP and 33.7{\%} faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.",
keywords = "CMP, ILP, MLER, SMT, TLP",
author = "Jaegeun Oh and Hwang, {Seok Joong} and Nguyen, {Huong Giang} and Areum Kim and Kim, {Seon Wook} and Chulwoo Kim and Jong-Kook Kim",
year = "2008",
month = "8",
day = "1",
language = "English",
volume = "30",
pages = "576--586",
journal = "ETRI Journal",
issn = "1225-6463",
publisher = "ETRI",
number = "4",

}

TY - JOUR

T1 - Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline

AU - Oh, Jaegeun

AU - Hwang, Seok Joong

AU - Nguyen, Huong Giang

AU - Kim, Areum

AU - Kim, Seon Wook

AU - Kim, Chulwoo

AU - Kim, Jong-Kook

PY - 2008/8/1

Y1 - 2008/8/1

N2 - In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SEMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SEMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 135% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

AB - In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SEMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SEMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 135% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

KW - CMP

KW - ILP

KW - MLER

KW - SMT

KW - TLP

UR - http://www.scopus.com/inward/record.url?scp=49449105358&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49449105358&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:49449105358

VL - 30

SP - 576

EP - 586

JO - ETRI Journal

JF - ETRI Journal

SN - 1225-6463

IS - 4

ER -