TY - JOUR
T1 - Filter cache
T2 - filtering useless cache blocks for a small but efficient shared last-level cache
AU - Bae, Han Jun
AU - Choi, Lynn
N1 - Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) and funded by the Ministry of Science, ICT and Future Planning (NRF-2017R1A2B2009 641). This research was also supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2019-2015-0-00363) supervised by the IITP (Institute for Information & Communications Technology Promotion). This research was supported by Korea University.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Although the shared last-level cache (SLLC) occupies a significant portion of multicore CPU chip die area, more than 59% of SLLC cache blocks are not reused during their lifetime. If we can filter out these useless blocks from SLLC, we can effectively reduce the size of SLLC without sacrificing performance. For this purpose, we classify the reuse of cache blocks into temporal and spatial reuse and further analyze the reuse by using reuse interval and reuse count. From our experimentation, we found that most of spatially reused cache blocks are reused only once with short reuse interval, so it is inefficient to manage them in SLLC. In this paper, we propose a new small additional cache called Filter Cache to the SLLC, which cannot only check the temporal reuse but also can prevent spatially reused blocks from entering the SLLC. Thus, we do not maintain data for non-reused blocks and spatially reused blocks in the SLLC, dramatically reducing the size of the SLLC. Through our detailed simulation on PARSEC benchmarks, we show that our new SLLC design with Filter Cache exhibits comparable performance to the conventional SLLC with only 24.21% of SLLC area across a variety of different workloads. This is achieved by its faster access and high reuse rates in the small SLLC with Filter Cache.
AB - Although the shared last-level cache (SLLC) occupies a significant portion of multicore CPU chip die area, more than 59% of SLLC cache blocks are not reused during their lifetime. If we can filter out these useless blocks from SLLC, we can effectively reduce the size of SLLC without sacrificing performance. For this purpose, we classify the reuse of cache blocks into temporal and spatial reuse and further analyze the reuse by using reuse interval and reuse count. From our experimentation, we found that most of spatially reused cache blocks are reused only once with short reuse interval, so it is inefficient to manage them in SLLC. In this paper, we propose a new small additional cache called Filter Cache to the SLLC, which cannot only check the temporal reuse but also can prevent spatially reused blocks from entering the SLLC. Thus, we do not maintain data for non-reused blocks and spatially reused blocks in the SLLC, dramatically reducing the size of the SLLC. Through our detailed simulation on PARSEC benchmarks, we show that our new SLLC design with Filter Cache exhibits comparable performance to the conventional SLLC with only 24.21% of SLLC area across a variety of different workloads. This is achieved by its faster access and high reuse rates in the small SLLC with Filter Cache.
KW - Cache organization
KW - Multicore CPU
KW - Reuse rate
KW - Shared last-level cache
KW - Spatial reuse
KW - Temporal reuse
UR - http://www.scopus.com/inward/record.url?scp=85078739952&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078739952&partnerID=8YFLogxK
U2 - 10.1007/s11227-020-03177-2
DO - 10.1007/s11227-020-03177-2
M3 - Article
AN - SCOPUS:85078739952
VL - 76
SP - 7521
EP - 7544
JO - The Journal of Supercomputing
JF - The Journal of Supercomputing
SN - 0920-8542
IS - 10
ER -