A New Representation of Skeleton Sequences for 3D Action Recognition

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid

2017-03-09CVPR 2017 73D Action Recognition Skeleton Based Action Recognition Multi-Task Learning Action Recognition Temporal Action Localization

Paper PDF

Abstract

This paper presents a new method for 3D action recognition with skeleton sequences (i.e., 3D trajectories of human skeleton joints). The proposed method first transforms each skeleton sequence into three clips each consisting of several frames for spatial temporal feature learning using deep neural networks. Each clip is generated from one channel of the cylindrical coordinates of the skeleton sequence. Each frame of the generated clips represents the temporal information of the entire skeleton sequence, and incorporates one particular spatial relationship between the joints. The entire clips include multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We propose to use deep convolutional neural networks to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and then use a Multi-Task Learning Network (MTLN) to jointly process all frames of the generated clips in parallel to incorporate spatial structural information for action recognition. Experimental results clearly show the effectiveness of the proposed new representation and feature learning method for 3D action recognition.

Results

Task	Dataset	Metric	Value	Model
Video	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Video	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Temporal Action Localization	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Temporal Action Localization	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Zero-Shot Learning	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Zero-Shot Learning	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Activity Recognition	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Activity Recognition	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Action Localization	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Action Localization	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Action Detection	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Action Detection	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
3D Action Recognition	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
3D Action Recognition	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN
Action Recognition	NTU RGB+D	Accuracy (CS)	79.6	Clips+CNN+MTLN
Action Recognition	NTU RGB+D	Accuracy (CV)	84.8	Clips+CNN+MTLN

A New Representation of Skeleton Sequences for 3D Action Recognition

Abstract

Results

Related Papers

A New Representation of Skeleton Sequences for 3D Action Recognition

Abstract

Results

Related Papers