Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox

2017-04-03ICCV 2017 10Action Classification Action Localization Skeleton Based Action Recognition Spatio-Temporal Action Localization General Classification Action Recognition Temporal Action Localization

Paper PDF Code

Abstract

General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.

Results

Task	Dataset	Metric	Value	Model
Video	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Video	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Video	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Temporal Action Localization	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Temporal Action Localization	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Temporal Action Localization	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Zero-Shot Learning	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Zero-Shot Learning	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Zero-Shot Learning	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Activity Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Activity Recognition	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Activity Recognition	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Action Localization	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Action Localization	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Action Localization	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Action Detection	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Action Detection	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Action Detection	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
3D Action Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
3D Action Recognition	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
3D Action Recognition	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)
Action Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	56.8	Chained
Action Recognition	J-HMDB	Accuracy (RGB+pose)	76.1	Chained (RGB+Flow +Pose)
Action Recognition	J-HMDB	Accuracy (pose)	56.8	Chained (RGB+Flow +Pose)

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Abstract

Results

Related Papers

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

Abstract

Results

Related Papers