TY - JOUR
T1 - Optimizing TensorFlow performance by reconstructing the convolution routine
AU - Kim, Minseong
AU - Choi, Kyu Hyun
AU - Paik, Yoonah
AU - Kim, Seon Wook
N1 - Funding Information:
This work was partially supported by SK Telecom Co., LTD.
Publisher Copyright:
© 2021 The Institute of Electronics and Information Engineers
PY - 2021/4
Y1 - 2021/4
N2 - Using deep learning, we can currently build computational models composed of multiple processing layers to learn representations of data. Convolutional neural networks (CNNs) have been widely adopted to achieve significant performance in image recognition and classification. TensorFlow, an open-source deep learning framework from Google, uses profiling to select one convolution algorithm, from among several available, as the core of a CNN to deliver the best performance in terms of execution time and memory usage. However, the overhead from profiling is considerably significant, because TensorFlow executes and profiles all the available algorithms for the best selection whenever an application is launched. We observe that memory usage overshoots during profiling, which limits data parallelism, and thus, fails to deliver maximum performance. In this paper, we present a novel profiling method to reduce overhead by storing the profile result from the first run and reusing it from the second run on. Using Inception-V3, we achieved up to 1.12 times and 1.11 times higher throughput, compared to the vanilla TensorFlow and TensorFlow with XLA JIT compilation, respectively, without losing accuracy.
AB - Using deep learning, we can currently build computational models composed of multiple processing layers to learn representations of data. Convolutional neural networks (CNNs) have been widely adopted to achieve significant performance in image recognition and classification. TensorFlow, an open-source deep learning framework from Google, uses profiling to select one convolution algorithm, from among several available, as the core of a CNN to deliver the best performance in terms of execution time and memory usage. However, the overhead from profiling is considerably significant, because TensorFlow executes and profiles all the available algorithms for the best selection whenever an application is launched. We observe that memory usage overshoots during profiling, which limits data parallelism, and thus, fails to deliver maximum performance. In this paper, we present a novel profiling method to reduce overhead by storing the profile result from the first run and reusing it from the second run on. Using Inception-V3, we achieved up to 1.12 times and 1.11 times higher throughput, compared to the vanilla TensorFlow and TensorFlow with XLA JIT compilation, respectively, without losing accuracy.
KW - Batch
KW - Optimization
KW - Profiling
KW - TensorFlow
UR - http://www.scopus.com/inward/record.url?scp=85113318937&partnerID=8YFLogxK
U2 - 10.5573/IEIESPC.2021.10.2.128
DO - 10.5573/IEIESPC.2021.10.2.128
M3 - Article
AN - SCOPUS:85113318937
VL - 10
SP - 128
EP - 135
JO - IEIE Transactions on Smart Processing and Computing
JF - IEIE Transactions on Smart Processing and Computing
SN - 2287-5255
IS - 2
ER -