Optimizing TensorFlow performance by reconstructing the convolution routine

Minseong Kim, Kyu Hyun Choi, Yoonah Paik, Seon Wook Kim

Research output: Contribution to journalArticlepeer-review


Using deep learning, we can currently build computational models composed of multiple processing layers to learn representations of data. Convolutional neural networks (CNNs) have been widely adopted to achieve significant performance in image recognition and classification. TensorFlow, an open-source deep learning framework from Google, uses profiling to select one convolution algorithm, from among several available, as the core of a CNN to deliver the best performance in terms of execution time and memory usage. However, the overhead from profiling is considerably significant, because TensorFlow executes and profiles all the available algorithms for the best selection whenever an application is launched. We observe that memory usage overshoots during profiling, which limits data parallelism, and thus, fails to deliver maximum performance. In this paper, we present a novel profiling method to reduce overhead by storing the profile result from the first run and reusing it from the second run on. Using Inception-V3, we achieved up to 1.12 times and 1.11 times higher throughput, compared to the vanilla TensorFlow and TensorFlow with XLA JIT compilation, respectively, without losing accuracy.

Original languageEnglish
Pages (from-to)128-135
Number of pages8
JournalIEIE Transactions on Smart Processing and Computing
Issue number2
Publication statusPublished - 2021 Apr


  • Batch
  • Optimization
  • Profiling
  • TensorFlow

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Optimizing TensorFlow performance by reconstructing the convolution routine'. Together they form a unique fingerprint.

Cite this