TY - JOUR
T1 - Coherence and Replacement Protocol of DICE - A Bus-Based COMA Multiprocessor
AU - Cho, Sangyeun
AU - Kong, Jinseok
AU - Lee, Gyungho
N1 - Funding Information:
We would like to thank the former members of the DICE project: Manu Agarwal, Sujat Jamil, Bland Quattlebaum, and Professor Larry Kinney in the Electrical and Computer Engineering Department, University of Minnesota. We also appreciate the constructive comments made by anonymous referees, which greatly helped improve the quality of this paper. The DICE project was supported by a funding from Samsung Electronics, Seoul, Korea and by a DoD AFOSR grant under Contract F49620-96-1-0472. Sangyeun Cho was supported in part by a fellowship from the Korea Foundation for Advanced Studies.
PY - 1999/4
Y1 - 1999/4
N2 - As microprocessors become faster and demand more bandwidth, the already limited scalability of a shared bus decreases even further. DICE, a shared-bus multiprocessor, utilizes cache only memory architecture (COMA) to effectively decrease the speed gap between modern high-performance microprocessors and the bus. DICE tries to optimize COMA for a shared-bus medium, in particular to reduce the detrimental effects of cache coherence and the "last memory block" problem on replacement. In this paper, we present the coherence and replacement protocol of the DICE multiprocessor and its design trade-offs. We describe a four-state write-invalidate coherence protocol in detail. Replacement, which poses a unique overhead problem of COMA, requires that a victim block with ownership be relocated to a remote node in order not to discard the last cached memory block. We show that the relocation process can be efficiently implemented by using a temporary storage called relocation buffer and a priority-based selection algorithm. We present performance results that show a drastic reduction in global bus traffic compared to a traditional shared-bus multiprocessor architecture.
AB - As microprocessors become faster and demand more bandwidth, the already limited scalability of a shared bus decreases even further. DICE, a shared-bus multiprocessor, utilizes cache only memory architecture (COMA) to effectively decrease the speed gap between modern high-performance microprocessors and the bus. DICE tries to optimize COMA for a shared-bus medium, in particular to reduce the detrimental effects of cache coherence and the "last memory block" problem on replacement. In this paper, we present the coherence and replacement protocol of the DICE multiprocessor and its design trade-offs. We describe a four-state write-invalidate coherence protocol in detail. Replacement, which poses a unique overhead problem of COMA, requires that a victim block with ownership be relocated to a remote node in order not to discard the last cached memory block. We show that the relocation process can be efficiently implemented by using a temporary storage called relocation buffer and a priority-based selection algorithm. We present performance results that show a drastic reduction in global bus traffic compared to a traditional shared-bus multiprocessor architecture.
KW - Distributed shared memory (DSM)
KW - Shared bus
KW - Symmetric multiprocessor (SMP)
UR - http://www.scopus.com/inward/record.url?scp=0043231129&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1998.1524
DO - 10.1006/jpdc.1998.1524
M3 - Article
AN - SCOPUS:0043231129
SN - 0743-7315
VL - 57
SP - 14
EP - 32
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 1
ER -