Prediction of partially observed human activity based on pre-trained deep representation

Dong Gyu Lee, Seong Whan Lee

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.

Original languageEnglish
Pages (from-to)198-206
Number of pages9
JournalPattern Recognition
Volume85
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Chemical activation
Neural networks
Costs
Experiments

Keywords

  • Human activity prediction
  • Human interaction
  • Pre-trained CNN
  • Sub-volume co-occurrence matrix

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Prediction of partially observed human activity based on pre-trained deep representation. / Lee, Dong Gyu; Lee, Seong Whan.

In: Pattern Recognition, Vol. 85, 01.01.2019, p. 198-206.

Research output: Contribution to journalArticle

@article{1eb99eccaf3c4a4f96f32f79bf26d4f7,
title = "Prediction of partially observed human activity based on pre-trained deep representation",
abstract = "Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.",
keywords = "Human activity prediction, Human interaction, Pre-trained CNN, Sub-volume co-occurrence matrix",
author = "Lee, {Dong Gyu} and Lee, {Seong Whan}",
year = "2019",
month = "1",
day = "1",
doi = "10.1016/j.patcog.2018.08.006",
language = "English",
volume = "85",
pages = "198--206",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Prediction of partially observed human activity based on pre-trained deep representation

AU - Lee, Dong Gyu

AU - Lee, Seong Whan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.

AB - Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.

KW - Human activity prediction

KW - Human interaction

KW - Pre-trained CNN

KW - Sub-volume co-occurrence matrix

UR - http://www.scopus.com/inward/record.url?scp=85052322974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052322974&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2018.08.006

DO - 10.1016/j.patcog.2018.08.006

M3 - Article

AN - SCOPUS:85052322974

VL - 85

SP - 198

EP - 206

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -