TY - GEN
T1 - Program analysis for cache coherence
T2 - 25th International Conference on Parallel Processing, ICPP 1996
AU - Choi, L.
AU - Yew, Pen Chung
N1 - Funding Information:
We have implemented these algorithms on the Polaris parallelizing compiler [l l], and demonstrated the performance driven by the new compiler algorithms by running execution-driven simulations of five Perfect benchmarks. The results show that by avoiding cache invalidations, the intraprocedural algorithm eliminates up to 26.0% of the cache misses for a compiler-directed scheme compared to an existing invalidation-based algorithm [7]. With the full inter-procedural analysis, up to 10.8% of additional cache misses can be removed. Acknowledgments The research described in this paper was supported in part by the NSF Grant No. MIP 89-20891, MIP 93-07910 and ARPA contract #DABT63-95-C-0097. This work is not necessarily representative of the positions or policies of the Army of the Government. This work was performed while the first author was at the University of Illinois. We thank Hock-Beng Lim at the University of Illinois for his valuable comments.
Publisher Copyright:
© 1996 IEEE.
PY - 1996
Y1 - 1996
N2 - The presence of procedures and procedure calls introduces side effects, which complicates the analysis of stale reference detection in compiler-directed cache coherence schemes. Previous compiler algorithms use cache invalidation at procedure boundary or inlining to avoid reference marking interprocedurally. We introduce a full interprocedural algorithm, which performs bottom-up and top-down analysis on the procedure call graph. This avoids unnecessary cache misses for subroutine local data and exploits locality across procedure boundaries. The result of execution-driven simulations on Perfect benchmarks demonstrates that, the interprocedural algorithm eliminates up to 36.8% of the cache misses for a compiler-directed scheme compared to an existing invalidation-based algorithm.
AB - The presence of procedures and procedure calls introduces side effects, which complicates the analysis of stale reference detection in compiler-directed cache coherence schemes. Previous compiler algorithms use cache invalidation at procedure boundary or inlining to avoid reference marking interprocedurally. We introduce a full interprocedural algorithm, which performs bottom-up and top-down analysis on the procedure call graph. This avoids unnecessary cache misses for subroutine local data and exploits locality across procedure boundaries. The result of execution-driven simulations on Perfect benchmarks demonstrates that, the interprocedural algorithm eliminates up to 36.8% of the cache misses for a compiler-directed scheme compared to an existing invalidation-based algorithm.
UR - http://www.scopus.com/inward/record.url?scp=33749929170&partnerID=8YFLogxK
U2 - 10.1109/ICPP.1996.538565
DO - 10.1109/ICPP.1996.538565
M3 - Conference contribution
AN - SCOPUS:33749929170
T3 - Proceedings of the International Conference on Parallel Processing
SP - 103
EP - 113
BT - Software
A2 - Pingali, K.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 August 1996 through 16 August 1996
ER -