TY - JOUR
T1 - Memory streaming acceleration for embedded systems with CPU-accelerator cooperative data processing
AU - Lee, Kwangho
AU - Kong, Joonho
AU - Kim, Young Geun
AU - Chung, Sung Woo
N1 - Funding Information:
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education ( NRF-2018R1D1A3B07045908 ).
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/11
Y1 - 2019/11
N2 - Memory streaming operations (i.e., memory-to-memory data transfer with or without simple arithmetic/logical operations) are one of the most important tasks in general embedded/mobile computer systems. In this paper, we propose a technique to accelerate memory streaming operations. The conventional way to accelerate memory streaming operations is employing direct memory access (DMA) with dedicated hardware accelerators for simple arithmetic/logical operations. In our technique, we utilize not only a hardware accelerator with DMA but also a central processing unit (CPU) to perform memory streaming operations, which improves the performance and energy efficiency of the system. We also implemented our prototype in a field-programmable gate array system-on-chip (FPGA-SoC) platform and evaluated our technique in real measurement from our prototype. From our experimental results, our technique improves memory streaming performance by 34.1–73.1% while reducing energy consumption by 29.0–45.5%. When we apply our technique to various real-world applications such as image processing, 1 × 1 convolution operations, and bias addition/scale, performances are improved by 1.1 × –2.4 × . In addition, our technique reduces energy consumptions when performing image processing, 1 × 1 convolution, and bias addition/scale by 7.9–17.7%, 46.8–57.7%, and 41.7–58.5%, respectively.
AB - Memory streaming operations (i.e., memory-to-memory data transfer with or without simple arithmetic/logical operations) are one of the most important tasks in general embedded/mobile computer systems. In this paper, we propose a technique to accelerate memory streaming operations. The conventional way to accelerate memory streaming operations is employing direct memory access (DMA) with dedicated hardware accelerators for simple arithmetic/logical operations. In our technique, we utilize not only a hardware accelerator with DMA but also a central processing unit (CPU) to perform memory streaming operations, which improves the performance and energy efficiency of the system. We also implemented our prototype in a field-programmable gate array system-on-chip (FPGA-SoC) platform and evaluated our technique in real measurement from our prototype. From our experimental results, our technique improves memory streaming performance by 34.1–73.1% while reducing energy consumption by 29.0–45.5%. When we apply our technique to various real-world applications such as image processing, 1 × 1 convolution operations, and bias addition/scale, performances are improved by 1.1 × –2.4 × . In addition, our technique reduces energy consumptions when performing image processing, 1 × 1 convolution, and bias addition/scale by 7.9–17.7%, 46.8–57.7%, and 41.7–58.5%, respectively.
KW - Accelerator
KW - Cooperative data transfer
KW - Direct memory access
KW - Heterogeneous computing
KW - Stream operation
UR - http://www.scopus.com/inward/record.url?scp=85072620894&partnerID=8YFLogxK
U2 - 10.1016/j.micpro.2019.102897
DO - 10.1016/j.micpro.2019.102897
M3 - Article
AN - SCOPUS:85072620894
SN - 0141-9331
VL - 71
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
M1 - 102897
ER -