PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

2022-05-27ICLR 2021 13D Action Recognition Semantic Segmentation Action Recognition

Abstract

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

Results

Task	Dataset	Metric	Value	Model
Video	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Video	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
Temporal Action Localization	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Temporal Action Localization	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
Zero-Shot Learning	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Zero-Shot Learning	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
Activity Recognition	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Activity Recognition	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
Action Localization	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Action Localization	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
3D Action Recognition	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
3D Action Recognition	NTU RGB+D	Cross View Accuracy	96.5	PSTNet
Action Recognition	NTU RGB+D	Cross Subject Accuracy	90.5	PSTNet
Action Recognition	NTU RGB+D	Cross View Accuracy	96.5	PSTNet

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Abstract

Results

Related Papers

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Abstract

Results

Related Papers