Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Fanfan Ye, ShiLiang Pu, Qiaoyong Zhong, Chao Li, Di Xie, Huiming Tang

2020-07-29Skeleton Based Action Recognition Action Recognition

Abstract

Graph Convolutional Networks (GCNs) have attracted increasing interests for the task of skeleton-based action recognition. The key lies in the design of the graph structure, which encodes skeleton topology information. In this paper, we propose Dynamic GCN, in which a novel convolutional neural network named Contextencoding Network (CeN) is introduced to learn skeleton topology automatically. In particular, when learning the dependency between two joints, contextual features from the rest joints are incorporated in a global manner. CeN is extremely lightweight yet effective, and can be embedded into a graph convolutional layer. By stacking multiple CeN-enabled graph convolutional layers, we build Dynamic GCN. Notably, as a merit of CeN, dynamic graph topologies are constructed for different input samples as well as graph convolutional layers of various depths. Besides, three alternative context modeling architectures are well explored, which may serve as a guideline for future research on graph topology learning. CeN brings only ~7% extra FLOPs for the baseline model, and Dynamic GCN achieves better performance with $2\times$~$4\times$ fewer FLOPs than existing methods. By further combining static physical body connections and motion modalities, we achieve state-of-the-art performance on three large-scale benchmarks, namely NTU-RGB+D, NTU-RGB+D 120 and Skeleton-Kinetics.

Results

Task	Dataset	Metric	Value	Model
Video	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Video	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Video	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Temporal Action Localization	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Zero-Shot Learning	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Activity Recognition	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Activity Recognition	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Activity Recognition	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Action Localization	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Action Localization	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Action Localization	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Action Detection	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Action Detection	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Action Detection	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
3D Action Recognition	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN
Action Recognition	Kinetics-Skeleton dataset	Accuracy	37.9	Dynamic GCN
Action Recognition	NTU RGB+D	Accuracy (CS)	91.5	Dynamic GCN
Action Recognition	NTU RGB+D	Accuracy (CV)	96	Dynamic GCN

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Abstract

Results

Related Papers

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

Abstract

Results

Related Papers