Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition

Dong Yang, Monica Mengqi Li, Hong Fu, Jicong Fan, Zhao Zhang, Howard Leung

2020-03-06Skeleton Based Action Recognition Action Recognition Temporal Action Localization Graph Embedding

Abstract

Combining skeleton structure with graph convolutional networks has achieved remarkable performance in human action recognition. Since current research focuses on designing basic graph for representing skeleton data, these embedding features contain basic topological information, which cannot learn more systematic perspectives from skeleton data. In this paper, we overcome this limitation by proposing a novel framework, which unifies 15 graph embedding features into the graph convolutional network for human action recognition, aiming to best take advantage of graph information to distinguish key joints, bones, and body parts in human action, instead of being exclusive to a single feature or domain. Additionally, we fully investigate how to find the best graph features of skeleton structure for improving human action recognition. Besides, the topological information of the skeleton sequence is explored to further enhance the performance in a multi-stream framework. Moreover, the unified graph features are extracted by the adaptive methods on the training process, which further yields improvements. Our model is validated by three large-scale datasets, namely NTU-RGB+D, Kinetics and SYSU-3D, and outperforms the state-of-the-art methods. Overall, our work unified graph embedding features to promotes systematic research on human action recognition.

Results

Task	Dataset	Metric	Value	Model
Video	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Video	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Video	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Temporal Action Localization	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Zero-Shot Learning	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Activity Recognition	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Activity Recognition	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Activity Recognition	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Action Localization	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Action Localization	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Action Localization	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Action Detection	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Action Detection	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Action Detection	NTU RGB+D	Accuracy (CV)	96.4	CGCN
3D Action Recognition	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	90.3	CGCN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	96.4	CGCN
Action Recognition	Kinetics-Skeleton dataset	Accuracy	37.5	CGCN
Action Recognition	NTU RGB+D	Accuracy (CS)	90.3	CGCN
Action Recognition	NTU RGB+D	Accuracy (CV)	96.4	CGCN

Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition

Abstract

Results

Related Papers

Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition

Abstract

Results

Related Papers