TY - GEN
T1 - Bit-width reduction and customized register for low cost convolutional neural network accelerator
AU - Choi, Kyungrak
AU - Choi, Woong
AU - Shin, Kyungho
AU - Park, Jongsun
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported bythe National Research Foundation of Korea grant funded by the Korea government (NRF-2016 R1A2B4015329 and NRF-2015M3D1A1070465), and the Information Technology Research and Development Program of Korea Evaluation Institute of Industrial Technology [10052716, Design technology development of ultralow voltage operating circuit and IP for smart sensor SoC]
Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/11
Y1 - 2017/8/11
N2 - This paper presents a low area and energy efficient hardware accelerator for the deep convolutional neural networks (CNNs). Based on the multiply-accumulate (MAC) based architecture, three design techniques are proposed to reduce the hardware cost of the convolutional computations. First, to reduce the computational bit-width of convolutions, an adaptive bit-width reduction scheme is proposed based on differential input method. The bit-width reduction approach can reduce the 37 % of operation bit-width with almost ignorable CNN accuracy degradation. Second, it has been found that adapting bi-directional filtering window in CNN accelerator can considerably reduce the energy for data movement with much smaller number of memory accesses. To expedite the bi-directional filtering operations, we also propose a bidirectional first-input-first-output (bi-FIFO). With SRAM bit-cell layout manner, the proposed bi-FIFO facilitates fast data re-distribution with area and energy efficiency. To verify the effectiveness of the proposed techniques, the AlexNet accelerator has been designed. The numerical results show that the proposed adaptive bit-width reduction scheme achieves 25.9% and 47.3% of area and energy savings, respectively. The bi-FIFO based accelerator also achieves 33 % improved processing time.
AB - This paper presents a low area and energy efficient hardware accelerator for the deep convolutional neural networks (CNNs). Based on the multiply-accumulate (MAC) based architecture, three design techniques are proposed to reduce the hardware cost of the convolutional computations. First, to reduce the computational bit-width of convolutions, an adaptive bit-width reduction scheme is proposed based on differential input method. The bit-width reduction approach can reduce the 37 % of operation bit-width with almost ignorable CNN accuracy degradation. Second, it has been found that adapting bi-directional filtering window in CNN accelerator can considerably reduce the energy for data movement with much smaller number of memory accesses. To expedite the bi-directional filtering operations, we also propose a bidirectional first-input-first-output (bi-FIFO). With SRAM bit-cell layout manner, the proposed bi-FIFO facilitates fast data re-distribution with area and energy efficiency. To verify the effectiveness of the proposed techniques, the AlexNet accelerator has been designed. The numerical results show that the proposed adaptive bit-width reduction scheme achieves 25.9% and 47.3% of area and energy savings, respectively. The bi-FIFO based accelerator also achieves 33 % improved processing time.
KW - Convolutional Neural Network
KW - Deep Neural Network
KW - Energy Efficiency
KW - FIFO
KW - Filter
KW - Line Buffer
KW - Weight
UR - http://www.scopus.com/inward/record.url?scp=85028582850&partnerID=8YFLogxK
U2 - 10.1109/ISLPED.2017.8009164
DO - 10.1109/ISLPED.2017.8009164
M3 - Conference contribution
AN - SCOPUS:85028582850
T3 - Proceedings of the International Symposium on Low Power Electronics and Design
BT - ISLPED 2017 - IEEE/ACM International Symposium on Low Power Electronics and Design
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2017
Y2 - 24 July 2017 through 26 July 2017
ER -