Compiler analysis for cache coherence: interprocedural array data-flow analysis and its impact on cache performance

Lynn Choi, Pen Chung Yew

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implemented cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.

Original languageEnglish
Pages (from-to)879-896
Number of pages18
JournalIEEE Transactions on Parallel and Distributed Systems
Volume11
Issue number9
DOIs
Publication statusPublished - 2000 Sep 1

Fingerprint

Data flow analysis
Cache Coherence
Data Flow
Compiler
Cache
Locality
Shared-memory multiprocessors
Data storage equipment
Parallelizing Compilers
Violate
Bottom-up
Computer hardware
Reuse
Eliminate
Scalar
Hardware
Benchmark
Minimise
Computing

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

@article{13b32d8b4db544989b3eecd4155a9046,
title = "Compiler analysis for cache coherence: interprocedural array data-flow analysis and its impact on cache performance",
abstract = "In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implemented cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.",
author = "Lynn Choi and Yew, {Pen Chung}",
year = "2000",
month = "9",
day = "1",
doi = "10.1109/71.879772",
language = "English",
volume = "11",
pages = "879--896",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "9",

}

TY - JOUR

T1 - Compiler analysis for cache coherence

T2 - interprocedural array data-flow analysis and its impact on cache performance

AU - Choi, Lynn

AU - Yew, Pen Chung

PY - 2000/9/1

Y1 - 2000/9/1

N2 - In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implemented cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.

AB - In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implemented cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.

UR - http://www.scopus.com/inward/record.url?scp=0034262613&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034262613&partnerID=8YFLogxK

U2 - 10.1109/71.879772

DO - 10.1109/71.879772

M3 - Article

AN - SCOPUS:0034262613

VL - 11

SP - 879

EP - 896

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 9

ER -