Interpretable human action recognition in compressed domain

Vignesh Srinivasan, Sebastian Lapuschkin, Cornelius Hellge, Klaus Muller, Wojciech Samek

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1692-1696
Number of pages5
ISBN (Electronic)9781509041176
DOIs
Publication statusPublished - 2017 Jun 16
Externally publishedYes
Event2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
Duration: 2017 Mar 52017 Mar 9

Other

Other2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
CountryUnited States
CityNew Orleans
Period17/3/517/3/9

Fingerprint

Classifiers
Decoding
Pipelines

Keywords

  • Action recognition
  • compressed domain
  • fisher vector encoding
  • interpretable classification
  • motion vectors

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Srinivasan, V., Lapuschkin, S., Hellge, C., Muller, K., & Samek, W. (2017). Interpretable human action recognition in compressed domain. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings (pp. 1692-1696). [7952445] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2017.7952445

Interpretable human action recognition in compressed domain. / Srinivasan, Vignesh; Lapuschkin, Sebastian; Hellge, Cornelius; Muller, Klaus; Samek, Wojciech.

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 1692-1696 7952445.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Srinivasan, V, Lapuschkin, S, Hellge, C, Muller, K & Samek, W 2017, Interpretable human action recognition in compressed domain. in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings., 7952445, Institute of Electrical and Electronics Engineers Inc., pp. 1692-1696, 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017, New Orleans, United States, 17/3/5. https://doi.org/10.1109/ICASSP.2017.7952445
Srinivasan V, Lapuschkin S, Hellge C, Muller K, Samek W. Interpretable human action recognition in compressed domain. In 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 1692-1696. 7952445 https://doi.org/10.1109/ICASSP.2017.7952445
Srinivasan, Vignesh ; Lapuschkin, Sebastian ; Hellge, Cornelius ; Muller, Klaus ; Samek, Wojciech. / Interpretable human action recognition in compressed domain. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 1692-1696
@inproceedings{dc2783e6991d47f0b3cfdcca4c82cc2e,
title = "Interpretable human action recognition in compressed domain",
abstract = "Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.",
keywords = "Action recognition, compressed domain, fisher vector encoding, interpretable classification, motion vectors",
author = "Vignesh Srinivasan and Sebastian Lapuschkin and Cornelius Hellge and Klaus Muller and Wojciech Samek",
year = "2017",
month = "6",
day = "16",
doi = "10.1109/ICASSP.2017.7952445",
language = "English",
pages = "1692--1696",
booktitle = "2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Interpretable human action recognition in compressed domain

AU - Srinivasan, Vignesh

AU - Lapuschkin, Sebastian

AU - Hellge, Cornelius

AU - Muller, Klaus

AU - Samek, Wojciech

PY - 2017/6/16

Y1 - 2017/6/16

N2 - Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.

AB - Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.

KW - Action recognition

KW - compressed domain

KW - fisher vector encoding

KW - interpretable classification

KW - motion vectors

UR - http://www.scopus.com/inward/record.url?scp=85023764863&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023764863&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2017.7952445

DO - 10.1109/ICASSP.2017.7952445

M3 - Conference contribution

AN - SCOPUS:85023764863

SP - 1692

EP - 1696

BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -