Enhancing Matrix Multiplication with a Monolithic 3D Based Scratchpad Memory

Cong Thuan Do, Jeong Hwan Choi, Young Seo Lee, Cheol Hong Kim, Sung Woo Chung

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Convolutional neural networks (CNNs) are one of the most popular machine learning algorithms. The convolutional layers, which account for most execution time of CNNs, are implemented with matrix multiplication because the convolution operation performs dot products between filters and local regions of the input. On the other hand, GPUs with thousands of cores were proven to significantly accelerate matrix multiplication, compared to CPUs with a limited number of cores, especially for large matrices. However, the current memory architecture allows only one row access at a time so that multiple accesses are necessary to read the column data of the second matrix, thus slowing down matrix multiplication. In this study, we adopt the monolithic 3D integration for the GPU scratchpad memory, called M3D SPM, to enhance matrix multiplication. The M3D SPM allows one access to read the column data of the second matrix, similar to the case of the first matrix. The simulation results show that our M3D SPM improves the system performance by 46.3% for the 32×32 matrix multiplication, over the conventional 2D SPM where the column data of the second matrix are read sequentially.

Original languageEnglish
JournalIEEE Embedded Systems Letters
Publication statusAccepted/In press - 2020


  • High performance
  • matrix multiplication
  • monolithic 3D
  • neural network
  • scratchpad memory.

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Enhancing Matrix Multiplication with a Monolithic 3D Based Scratchpad Memory'. Together they form a unique fingerprint.

Cite this