TY - GEN
T1 - Prediction confidence based low complexity gradient computation for accelerating DNN training
AU - Shin, Dongyeob
AU - Kim, Geonho
AU - Jo, Joongho
AU - Park, Jongsun
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea grant funded by the Korea government (No. NRF-2020R1A2C3014820), and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-2018-0-01433) supervised by the IITP (Institute for Information & communications Technology Promotion).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - In deep neural network (DNN) training, network weights are iteratively updated with the weight gradients that are obtained from stochastic gradient descent (SGD). Since SGD inherently allows gradient calculations with noise, approximating weight gradient computations have a large potential of training energy/time savings without degrading accuracy. In this paper, we propose an input-dependent approximation of the weight gradient for improving energy efficiency of training process. Considering that the output predictions of network (confidence) changes with training inputs, the relation between the confidence and the magnitude of weight gradient can be efficiently exploited to skip the gradient computations without accuracy drop, especially for high confidence inputs. With a given squared error constraint, the computation skip rates can be also controlled by changing the confidence threshold. The simulation results show that our approach can skip 72.6% of gradient computations for CIFAR-100 dataset using ResNet-18 without accuracy degradation. Hardware implementation with 65nm CMOS process shows that our design achieves 88.84% and 98.16% of maximum per epoch training energy and time savings, respectively, for CIFAR-100 dataset using ResNet-18 compared to state-of-the-art training accelerator.
AB - In deep neural network (DNN) training, network weights are iteratively updated with the weight gradients that are obtained from stochastic gradient descent (SGD). Since SGD inherently allows gradient calculations with noise, approximating weight gradient computations have a large potential of training energy/time savings without degrading accuracy. In this paper, we propose an input-dependent approximation of the weight gradient for improving energy efficiency of training process. Considering that the output predictions of network (confidence) changes with training inputs, the relation between the confidence and the magnitude of weight gradient can be efficiently exploited to skip the gradient computations without accuracy drop, especially for high confidence inputs. With a given squared error constraint, the computation skip rates can be also controlled by changing the confidence threshold. The simulation results show that our approach can skip 72.6% of gradient computations for CIFAR-100 dataset using ResNet-18 without accuracy degradation. Hardware implementation with 65nm CMOS process shows that our design achieves 88.84% and 98.16% of maximum per epoch training energy and time savings, respectively, for CIFAR-100 dataset using ResNet-18 compared to state-of-the-art training accelerator.
UR - http://www.scopus.com/inward/record.url?scp=85093928359&partnerID=8YFLogxK
U2 - 10.1109/DAC18072.2020.9218650
DO - 10.1109/DAC18072.2020.9218650
M3 - Conference contribution
AN - SCOPUS:85093928359
T3 - Proceedings - Design Automation Conference
BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020
Y2 - 20 July 2020 through 24 July 2020
ER -