Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

Seong Sik Cho, A. Reum Lee, Heung Il Suk, Jeong Seon Park, Seong Whan Lee

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)


The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

Original languageEnglish
Article number141699
JournalOptical Engineering
Issue number3
Publication statusPublished - 2015 Mar 1


  • action recognition
  • depth camera
  • point clouds
  • view invariance
  • volumetric spatial feature representation

ASJC Scopus subject areas

  • Atomic and Molecular Physics, and Optics
  • Engineering(all)


Dive into the research topics of 'Volumetric spatial feature representation for view-invariant human action recognition using a depth camera'. Together they form a unique fingerprint.

Cite this