Pose And Joint-Aware Action Recognition

Anshul Shah, Shlok Mishra, Ankan Bansal, Jun-Cheng Chen, Rama Chellappa, Abhinav Shrivastava

2020-10-16Action Classification Optical Flow Estimation Skeleton Based Action Recognition Data Augmentation Action Recognition Action Recognition In Videos Temporal Action Localization Activity Recognition

Paper PDF Code(official)

Abstract

Recent progress on action recognition has mainly focused on RGB and optical flow features. In this paper, we approach the problem of joint-based action recognition. Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition. We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We also propose a novel joint-contrastive loss that pulls together groups of joint features which convey the same action. We strengthen the joint-based representations by using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics of the action. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets. A late fusion with RGB and Flow-based approaches yields additional improvements. Our model also outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.

Results

Task	Dataset	Metric	Value	Model
Video	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Video	Charades	MAP	43.23	JMRN + R101-NL-LFB
Video	Charades	MAP	16.2	JMRN (Pose only)
Temporal Action Localization	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Zero-Shot Learning	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Activity Recognition	HMDB-51	Average accuracy of 3 splits	84.53	Ours + ResNext101 BERT
Activity Recognition	HMDB-51	Average accuracy of 3 splits	54.2	JRMN
Activity Recognition	AVA v2.1	mAP (Val)	28.4	JMRN + SlowFast-R101-NL
Activity Recognition	Mimetics	mAP	40	JMRN
Activity Recognition	Mimetics	mAP	38.3	SIP-Net
Activity Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Action Localization	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Action Detection	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
3D Action Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)
Action Recognition	HMDB-51	Average accuracy of 3 splits	84.53	Ours + ResNext101 BERT
Action Recognition	HMDB-51	Average accuracy of 3 splits	54.2	JRMN
Action Recognition	AVA v2.1	mAP (Val)	28.4	JMRN + SlowFast-R101-NL
Action Recognition	Mimetics	mAP	40	JMRN
Action Recognition	Mimetics	mAP	38.3	SIP-Net
Action Recognition	JHMDB (2D poses only)	Average accuracy of 3 splits	68.55	JMRN (No GT pose)

Pose And Joint-Aware Action Recognition

Abstract

Results

Related Papers

Pose And Joint-Aware Action Recognition

Abstract

Results

Related Papers