Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

Jialin Gao, Tong He, Xi Zhou, Shiming Ge

2019-12-24Skeleton Based Action Recognition Action Recognition

Abstract

A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints. However, these approaches usually ignore the spatial-temporal global context as well as the local relation between inter-frame and intra-frame. In this paper, we propose a focusing and diffusion mechanism to enhance graph convolutional networks by paying attention to the kinematic dependence of articulated human pose in a frame and their implicit dependencies over frames. In the focusing process, we introduce an attention module to learn a latent node over the intra-frame joints to convey spatial contextual information. In this way, the sparse connections between joints in a frame can be well captured, while the global context over the entire sequence is further captured by these hidden nodes with a bidirectional LSTM. In the diffusing process, the learned spatial-temporal contextual information is passed back to the spatial joints, leading to a bidirectional attentive graph convolutional network (BAGCN) that can facilitate skeleton-based action recognition. Extensive experiments on the challenging NTU RGB+D and Skeleton-Kinetics benchmarks demonstrate the efficacy of our approach.

Results

Task	Dataset	Metric	Value	Model
Video	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Video	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Activity Recognition	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Activity Recognition	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Action Localization	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Action Localization	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Action Detection	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Action Detection	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	96.3	BAGCN
Action Recognition	NTU RGB+D	Accuracy (CS)	90.3	BAGCN
Action Recognition	NTU RGB+D	Accuracy (CV)	96.3	BAGCN

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

Abstract

Results

Related Papers

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

Abstract

Results

Related Papers