TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PSTNet: Point Spatio-Temporal Convolution on Point Cloud S...

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

2022-05-27ICLR 2021 13D Action RecognitionSemantic SegmentationAction Recognition
PaperPDFCode(official)

Abstract

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DCross Subject Accuracy90.5PSTNet
VideoNTU RGB+DCross View Accuracy96.5PSTNet
Temporal Action LocalizationNTU RGB+DCross Subject Accuracy90.5PSTNet
Temporal Action LocalizationNTU RGB+DCross View Accuracy96.5PSTNet
Zero-Shot LearningNTU RGB+DCross Subject Accuracy90.5PSTNet
Zero-Shot LearningNTU RGB+DCross View Accuracy96.5PSTNet
Activity RecognitionNTU RGB+DCross Subject Accuracy90.5PSTNet
Activity RecognitionNTU RGB+DCross View Accuracy96.5PSTNet
Action LocalizationNTU RGB+DCross Subject Accuracy90.5PSTNet
Action LocalizationNTU RGB+DCross View Accuracy96.5PSTNet
3D Action RecognitionNTU RGB+DCross Subject Accuracy90.5PSTNet
3D Action RecognitionNTU RGB+DCross View Accuracy96.5PSTNet
Action RecognitionNTU RGB+DCross Subject Accuracy90.5PSTNet
Action RecognitionNTU RGB+DCross View Accuracy96.5PSTNet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15