Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

Seong Sik Cho, A. Reum Lee, Heung-Il Suk, Jeong Seon Park, Seong Whan Lee

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

Original languageEnglish
Article number141699
JournalOptical Engineering
Volume54
Issue number3
DOIs
Publication statusPublished - 2015 Mar 1

Fingerprint

Cameras
cameras
matrices
bags

Keywords

  • action recognition
  • depth camera
  • point clouds
  • view invariance
  • volumetric spatial feature representation

ASJC Scopus subject areas

  • Atomic and Molecular Physics, and Optics
  • Engineering(all)

Cite this

Volumetric spatial feature representation for view-invariant human action recognition using a depth camera. / Cho, Seong Sik; Lee, A. Reum; Suk, Heung-Il; Park, Jeong Seon; Lee, Seong Whan.

In: Optical Engineering, Vol. 54, No. 3, 141699, 01.03.2015.

Research output: Contribution to journalArticle

@article{6b5ed2fcde624a2fadcf867f27dd5efe,
title = "Volumetric spatial feature representation for view-invariant human action recognition using a depth camera",
abstract = "The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.",
keywords = "action recognition, depth camera, point clouds, view invariance, volumetric spatial feature representation",
author = "Cho, {Seong Sik} and Lee, {A. Reum} and Heung-Il Suk and Park, {Jeong Seon} and Lee, {Seong Whan}",
year = "2015",
month = "3",
day = "1",
doi = "10.1117/1.OE.54.3.033102",
language = "English",
volume = "54",
journal = "Optical Engineering",
issn = "0091-3286",
publisher = "SPIE",
number = "3",

}

TY - JOUR

T1 - Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

AU - Cho, Seong Sik

AU - Lee, A. Reum

AU - Suk, Heung-Il

AU - Park, Jeong Seon

AU - Lee, Seong Whan

PY - 2015/3/1

Y1 - 2015/3/1

N2 - The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

AB - The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

KW - action recognition

KW - depth camera

KW - point clouds

KW - view invariance

KW - volumetric spatial feature representation

UR - http://www.scopus.com/inward/record.url?scp=84923886781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923886781&partnerID=8YFLogxK

U2 - 10.1117/1.OE.54.3.033102

DO - 10.1117/1.OE.54.3.033102

M3 - Article

VL - 54

JO - Optical Engineering

JF - Optical Engineering

SN - 0091-3286

IS - 3

M1 - 141699

ER -