Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu

2018-05-20CVPR 2019 63D Action Recognition Skeleton Based Action Recognition Action Recognition Temporal Action Localization graph construction Vocal Bursts Valence Prediction

Paper PDF Code Code Code Code

Abstract

In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.

Results

Task	Dataset	Metric	Value	Model
Video	Assembly101	Actions Top-1	26.7	2s-AGCN
Video	Assembly101	Object Top-1	33.8	2s-AGCN
Video	Assembly101	Verbs Top-1	64.4	2s-AGCN
Video	UAV-Human	CSv1(%)	34.84	2S-AGCN
Video	UAV-Human	CSv2(%)	66.68	2S-AGCN
Video	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Video	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Temporal Action Localization	Assembly101	Actions Top-1	26.7	2s-AGCN
Temporal Action Localization	Assembly101	Object Top-1	33.8	2s-AGCN
Temporal Action Localization	Assembly101	Verbs Top-1	64.4	2s-AGCN
Temporal Action Localization	UAV-Human	CSv1(%)	34.84	2S-AGCN
Temporal Action Localization	UAV-Human	CSv2(%)	66.68	2S-AGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Zero-Shot Learning	Assembly101	Actions Top-1	26.7	2s-AGCN
Zero-Shot Learning	Assembly101	Object Top-1	33.8	2s-AGCN
Zero-Shot Learning	Assembly101	Verbs Top-1	64.4	2s-AGCN
Zero-Shot Learning	UAV-Human	CSv1(%)	34.84	2S-AGCN
Zero-Shot Learning	UAV-Human	CSv2(%)	66.68	2S-AGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Activity Recognition	Assembly101	Actions Top-1	26.7	2s-AGCN
Activity Recognition	Assembly101	Object Top-1	33.8	2s-AGCN
Activity Recognition	Assembly101	Verbs Top-1	64.4	2s-AGCN
Activity Recognition	UAV-Human	CSv1(%)	34.84	2S-AGCN
Activity Recognition	UAV-Human	CSv2(%)	66.68	2S-AGCN
Activity Recognition	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Activity Recognition	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Action Localization	Assembly101	Actions Top-1	26.7	2s-AGCN
Action Localization	Assembly101	Object Top-1	33.8	2s-AGCN
Action Localization	Assembly101	Verbs Top-1	64.4	2s-AGCN
Action Localization	UAV-Human	CSv1(%)	34.84	2S-AGCN
Action Localization	UAV-Human	CSv2(%)	66.68	2S-AGCN
Action Localization	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Action Localization	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Action Detection	UAV-Human	CSv1(%)	34.84	2S-AGCN
Action Detection	UAV-Human	CSv2(%)	66.68	2S-AGCN
Action Detection	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Action Detection	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
3D Action Recognition	Assembly101	Actions Top-1	26.7	2s-AGCN
3D Action Recognition	Assembly101	Object Top-1	33.8	2s-AGCN
3D Action Recognition	Assembly101	Verbs Top-1	64.4	2s-AGCN
3D Action Recognition	UAV-Human	CSv1(%)	34.84	2S-AGCN
3D Action Recognition	UAV-Human	CSv2(%)	66.68	2S-AGCN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN
Action Recognition	Assembly101	Actions Top-1	26.7	2s-AGCN
Action Recognition	Assembly101	Object Top-1	33.8	2s-AGCN
Action Recognition	Assembly101	Verbs Top-1	64.4	2s-AGCN
Action Recognition	UAV-Human	CSv1(%)	34.84	2S-AGCN
Action Recognition	UAV-Human	CSv2(%)	66.68	2S-AGCN
Action Recognition	NTU RGB+D	Accuracy (CS)	88.5	2s-NLGCN
Action Recognition	NTU RGB+D	Accuracy (CV)	95.1	2s-NLGCN

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Abstract

Results

Related Papers

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition

Abstract

Results

Related Papers