A Low-power network-on-chip architecture for tile-based chip multi-processors

Anastasios Psarras, Junghee Lee, Pavlos Mattheakis, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.

Original languageEnglish
Title of host publicationGLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI
PublisherAssociation for Computing Machinery
Pages335-340
Number of pages6
ISBN (Electronic)9781450342742
DOIs
Publication statusPublished - 2016 May 18
Externally publishedYes
Event26th ACM Great Lakes Symposium on VLSI, GLSVLSI 2016 - Boston, United States
Duration: 2016 May 182016 May 20

Publication series

NameProceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
Volume18-20-May-2016

Conference

Conference26th ACM Great Lakes Symposium on VLSI, GLSVLSI 2016
CountryUnited States
CityBoston
Period16/5/1816/5/20

Fingerprint

Tile
Clocks
Wire
Network-on-chip
Routers
Throughput
Hardware

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Psarras, A., Lee, J., Mattheakis, P., Nicopoulos, C., & Dimitrakopoulos, G. (2016). A Low-power network-on-chip architecture for tile-based chip multi-processors. In GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI (pp. 335-340). (Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI; Vol. 18-20-May-2016). Association for Computing Machinery. https://doi.org/10.1145/2902961.2903010

A Low-power network-on-chip architecture for tile-based chip multi-processors. / Psarras, Anastasios; Lee, Junghee; Mattheakis, Pavlos; Nicopoulos, Chrysostomos; Dimitrakopoulos, Giorgos.

GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI. Association for Computing Machinery, 2016. p. 335-340 (Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI; Vol. 18-20-May-2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Psarras, A, Lee, J, Mattheakis, P, Nicopoulos, C & Dimitrakopoulos, G 2016, A Low-power network-on-chip architecture for tile-based chip multi-processors. in GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI. Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI, vol. 18-20-May-2016, Association for Computing Machinery, pp. 335-340, 26th ACM Great Lakes Symposium on VLSI, GLSVLSI 2016, Boston, United States, 16/5/18. https://doi.org/10.1145/2902961.2903010
Psarras A, Lee J, Mattheakis P, Nicopoulos C, Dimitrakopoulos G. A Low-power network-on-chip architecture for tile-based chip multi-processors. In GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI. Association for Computing Machinery. 2016. p. 335-340. (Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI). https://doi.org/10.1145/2902961.2903010
Psarras, Anastasios ; Lee, Junghee ; Mattheakis, Pavlos ; Nicopoulos, Chrysostomos ; Dimitrakopoulos, Giorgos. / A Low-power network-on-chip architecture for tile-based chip multi-processors. GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI. Association for Computing Machinery, 2016. pp. 335-340 (Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI).
@inproceedings{41c2b1e1c77e48e5876fcbd096cfe66a,
title = "A Low-power network-on-chip architecture for tile-based chip multi-processors",
abstract = "Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.",
author = "Anastasios Psarras and Junghee Lee and Pavlos Mattheakis and Chrysostomos Nicopoulos and Giorgos Dimitrakopoulos",
year = "2016",
month = "5",
day = "18",
doi = "10.1145/2902961.2903010",
language = "English",
series = "Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI",
publisher = "Association for Computing Machinery",
pages = "335--340",
booktitle = "GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI",

}

TY - GEN

T1 - A Low-power network-on-chip architecture for tile-based chip multi-processors

AU - Psarras, Anastasios

AU - Lee, Junghee

AU - Mattheakis, Pavlos

AU - Nicopoulos, Chrysostomos

AU - Dimitrakopoulos, Giorgos

PY - 2016/5/18

Y1 - 2016/5/18

N2 - Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.

AB - Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.

UR - http://www.scopus.com/inward/record.url?scp=84974733264&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84974733264&partnerID=8YFLogxK

U2 - 10.1145/2902961.2903010

DO - 10.1145/2902961.2903010

M3 - Conference contribution

AN - SCOPUS:84974733264

T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI

SP - 335

EP - 340

BT - GLSVLSI 2016 - Proceedings of the 2016 ACM Great Lakes Symposium on VLSI

PB - Association for Computing Machinery

ER -