Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition

Konstantinos Papadopoulos, Enjie Ghorbel, Djamila Aouada, Björn Ottersten

2019-12-20Skeleton Based Action Recognition Action Recognition

Abstract

This paper extends the Spatial-Temporal Graph Convolutional Network (ST-GCN) for skeleton-based action recognition by introducing two novel modules, namely, the Graph Vertex Feature Encoder (GVFE) and the Dilated Hierarchical Temporal Convolutional Network (DH-TCN). On the one hand, the GVFE module learns appropriate vertex features for action recognition by encoding raw skeleton data into a new feature space. On the other hand, the DH-TCN module is capable of capturing both short-term and long-term temporal dependencies using a hierarchical dilated convolutional network. Experiments have been conducted on the challenging NTU RGB-D-60 and NTU RGB-D 120 datasets. The obtained results show that our method competes with state-of-the-art approaches while using a smaller number of layers and parameters; thus reducing the required training time and memory.

Results

Task	Dataset	Metric	Value	Model
Video	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Video	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Activity Recognition	NTU RGB+D 120	Accuracy (Cross-Setup)	78.3	ST-GCN + AS-GCN w/DH-TCN
Activity Recognition	NTU RGB+D 120	Accuracy (Cross-Subject)	79.2	ST-GCN + AS-GCN w/DH-TCN
Activity Recognition	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Activity Recognition	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Action Localization	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Action Localization	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Action Detection	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Action Detection	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN
Action Recognition	NTU RGB+D 120	Accuracy (Cross-Setup)	78.3	ST-GCN + AS-GCN w/DH-TCN
Action Recognition	NTU RGB+D 120	Accuracy (Cross-Subject)	79.2	ST-GCN + AS-GCN w/DH-TCN
Action Recognition	NTU RGB+D	Accuracy (CS)	85.3	GVFE + AS-GCN with DH-TCN
Action Recognition	NTU RGB+D	Accuracy (CV)	92.8	GVFE + AS-GCN with DH-TCN

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition

Abstract

Results

Related Papers

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition

Abstract

Results

Related Papers