Dynamic directory table with victim cache

on-demand allocation of directory entries for active shared cache blocks

Han Jun Bae, Lynn Choi

Research output: Contribution to journalArticle

Abstract

In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is a decoupled directory storage from the shared cache and dynamically maintains directory entries only for actively shared blocks. Also, we add a small additional victim cache to its original DDT in order to reduce invalidation broadcasts caused by DDT eviction. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 16.09% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

Original languageEnglish
JournalJournal of Supercomputing
DOIs
Publication statusAccepted/In press - 2019 Jan 1

Fingerprint

Cache
Table
Hits
Margin
Broadcast
Workload
High Performance
Benchmark
Demand
Optimization
Demonstrate
Simulation
Architecture

Keywords

  • Cache coherence
  • Directory
  • Multi-core architectures
  • Parallel processing
  • Scalable computing
  • Simulation

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture

Cite this

@article{267f12e30cc0477e92f9bff00868f96f,
title = "Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks",
abstract = "In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is a decoupled directory storage from the shared cache and dynamically maintains directory entries only for actively shared blocks. Also, we add a small additional victim cache to its original DDT in order to reduce invalidation broadcasts caused by DDT eviction. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 16.09{\%} of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.",
keywords = "Cache coherence, Directory, Multi-core architectures, Parallel processing, Scalable computing, Simulation",
author = "Bae, {Han Jun} and Lynn Choi",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s11227-018-02735-z",
language = "English",
journal = "The Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Dynamic directory table with victim cache

T2 - on-demand allocation of directory entries for active shared cache blocks

AU - Bae, Han Jun

AU - Choi, Lynn

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is a decoupled directory storage from the shared cache and dynamically maintains directory entries only for actively shared blocks. Also, we add a small additional victim cache to its original DDT in order to reduce invalidation broadcasts caused by DDT eviction. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 16.09% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

AB - In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is a decoupled directory storage from the shared cache and dynamically maintains directory entries only for actively shared blocks. Also, we add a small additional victim cache to its original DDT in order to reduce invalidation broadcasts caused by DDT eviction. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 16.09% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

KW - Cache coherence

KW - Directory

KW - Multi-core architectures

KW - Parallel processing

KW - Scalable computing

KW - Simulation

UR - http://www.scopus.com/inward/record.url?scp=85059565727&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059565727&partnerID=8YFLogxK

U2 - 10.1007/s11227-018-02735-z

DO - 10.1007/s11227-018-02735-z

M3 - Article

JO - The Journal of Supercomputing

JF - The Journal of Supercomputing

SN - 0920-8542

ER -