Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing

Junghee Lee, Chrysostomos Nicopoulos, Hyung Gyu Lee, Jongman Kim

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth ''sharding'' (i.e., partitioning) and stealing in order to mitigate the elevation in the zeroload latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.

Original languageEnglish
Pages (from-to)372-388
Number of pages17
JournalParallel Computing
Volume39
Issue number9
DOIs
Publication statusPublished - 2013 Jan 1
Externally publishedYes

Fingerprint

Router
Routers
Chip
Bandwidth
Latency
Slicing
Many-core
Simulation Framework
Enlargement
Interconnection Networks
System Simulation
Execution Time
Telecommunication links
Workload
Architecture
Partitioning
Throughput
Die
Hardware
Synthesis

Keywords

  • Bandwidth slicing
  • Channel width
  • Link bit-width
  • Network-on-chip
  • Physically segregated networks

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Cite this

Sharded Router : A novel on-chip router architecture employing bandwidth sharding and stealing. / Lee, Junghee; Nicopoulos, Chrysostomos; Lee, Hyung Gyu; Kim, Jongman.

In: Parallel Computing, Vol. 39, No. 9, 01.01.2013, p. 372-388.

Research output: Contribution to journalArticle

Lee, Junghee ; Nicopoulos, Chrysostomos ; Lee, Hyung Gyu ; Kim, Jongman. / Sharded Router : A novel on-chip router architecture employing bandwidth sharding and stealing. In: Parallel Computing. 2013 ; Vol. 39, No. 9. pp. 372-388.
@article{9a233f3eb3dc4869bd4cc1f4f96974b5,
title = "Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing",
abstract = "Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth ''sharding'' (i.e., partitioning) and stealing in order to mitigate the elevation in the zeroload latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19{\%} and the execution time of real multi-threaded workloads by up to 43{\%}. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.",
keywords = "Bandwidth slicing, Channel width, Link bit-width, Network-on-chip, Physically segregated networks",
author = "Junghee Lee and Chrysostomos Nicopoulos and Lee, {Hyung Gyu} and Jongman Kim",
year = "2013",
month = "1",
day = "1",
doi = "10.1016/j.parco.2013.04.004",
language = "English",
volume = "39",
pages = "372--388",
journal = "Parallel Computing",
issn = "0167-8191",
publisher = "Elsevier",
number = "9",

}

TY - JOUR

T1 - Sharded Router

T2 - A novel on-chip router architecture employing bandwidth sharding and stealing

AU - Lee, Junghee

AU - Nicopoulos, Chrysostomos

AU - Lee, Hyung Gyu

AU - Kim, Jongman

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth ''sharding'' (i.e., partitioning) and stealing in order to mitigate the elevation in the zeroload latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.

AB - Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth ''sharding'' (i.e., partitioning) and stealing in order to mitigate the elevation in the zeroload latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.

KW - Bandwidth slicing

KW - Channel width

KW - Link bit-width

KW - Network-on-chip

KW - Physically segregated networks

UR - http://www.scopus.com/inward/record.url?scp=84885372396&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885372396&partnerID=8YFLogxK

U2 - 10.1016/j.parco.2013.04.004

DO - 10.1016/j.parco.2013.04.004

M3 - Article

AN - SCOPUS:84885372396

VL - 39

SP - 372

EP - 388

JO - Parallel Computing

JF - Parallel Computing

SN - 0167-8191

IS - 9

ER -