STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

William McNally, Alexander Wong, John McPhee

2019-02-26Skeleton Based Action Recognition Pose Estimation Multimodal Activity Recognition Action Recognition Temporal Action Localization

Paper PDF

Abstract

While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors.

Results

Task	Dataset	Metric	Value	Model
Video	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Temporal Action Localization	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Zero-Shot Learning	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Activity Recognition	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Activity Recognition	UTD-MHAD	Accuracy (CS)	90	STAR-Net
Action Localization	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Action Detection	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
3D Action Recognition	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net
Action Recognition	J-HMDB	Accuracy (RGB+pose)	64.3	STAR-Net

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

Abstract

Results

Related Papers

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

Abstract

Results

Related Papers