TY - JOUR
T1 - Evaluating the visualization of what a deep neural network has learned
AU - Samek, Wojciech
AU - Binder, Alexander
AU - Montavon, Grégoire
AU - Lapuschkin, Sebastian
AU - Müller, Klaus Robert
N1 - Funding Information:
Manuscript received June 1, 2016; revised August 7, 2016; accepted August 9, 2016. Date of publication August 25, 2016; date of current version October 16, 2017. This work was supported in part by the Brain Korea 21 Plus Program through the National Research Foundation of Korea Funded by the Ministry of Education, in part by DFG under Grant MU 987/17-1, and in part by the German Ministry for Education and Research as Berlin Big Data Center under Grant 01IS14013A. (Wojciech Samek and Alexander Binder contributed equally to this work.) (Corresponding authors: Wojciech Samek; Alexander Binder; Klaus-Robert Müller.) W. Samek and S. Lapuschkin are with the Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany (e-mail: wojciech.samek@hhi.fraunhofer.de; sebastian.lapuschkin@hhi.fraunhofer.de).
Publisher Copyright:
© 2016 IEEE.
PY - 2017/11
Y1 - 2017/11
N2 - Deep neural networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multilayer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision, given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the 'importance' of individual pixels with respect to the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper, we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets. Our main result is that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of the neural network performance.
AB - Deep neural networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multilayer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision, given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the 'importance' of individual pixels with respect to the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper, we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets. Our main result is that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of the neural network performance.
KW - Convolutional neural networks
KW - Explaining classification
KW - Image classification
KW - Interpretable machine learning
KW - Relevance models
UR - http://www.scopus.com/inward/record.url?scp=84983621562&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2016.2599820
DO - 10.1109/TNNLS.2016.2599820
M3 - Article
C2 - 27576267
AN - SCOPUS:84983621562
SN - 2162-237X
VL - 28
SP - 2660
EP - 2673
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 11
M1 - 7552539
ER -