TY - GEN
T1 - Access patern-aware cache management for improving data utilization in GPU
AU - Koo, Gunjae
AU - Oh, Yunho
AU - Ro, Won Woo
AU - Annavaram, Murali
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/6/24
Y1 - 2017/6/24
N2 - Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates signifcant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classifed as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra-and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.
AB - Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates signifcant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classifed as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra-and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.
KW - Cache management
KW - GPGPU
KW - Memory access patterns
UR - http://www.scopus.com/inward/record.url?scp=85025653325&partnerID=8YFLogxK
U2 - 10.1145/3079856.3080239
DO - 10.1145/3079856.3080239
M3 - Conference contribution
AN - SCOPUS:85025653325
T3 - Proceedings - International Symposium on Computer Architecture
SP - 307
EP - 319
BT - ISCA 2017 - 44th Annual International Symposium on Computer Architecture - Conference Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th Annual International Symposium on Computer Architecture - ISCA 2017
Y2 - 24 June 2017 through 28 June 2017
ER -