TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-positioning Point-based Transformer for Point Cloud U...

Self-positioning Point-based Transformer for Point Cloud Understanding

Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim

2023-03-29CVPR 2023 1Scene SegmentationSemantic SegmentationSupervised Only 3D Point Cloud Classification3D Part Segmentation3D Point Cloud Classification
PaperPDFCode(official)

Abstract

Transformers have shown superior performance on various computer vision tasks with their capabilities to capture long-range dependencies. Despite the success, it is challenging to directly apply Transformers on point clouds due to their quadratic cost in the number of points. In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity. Specifically, this architecture consists of local self-attention and self-positioning point-based global cross-attention. The self-positioning points, adaptively located based on the input shape, consider both spatial and semantic information with disentangled attention to improve expressive power. With the self-positioning points, we propose a novel global cross-attention mechanism for point clouds, which improves the scalability of global self-attention by allowing the attention module to compute attention weights with only a small set of self-positioning points. Experiments show the effectiveness of SPoTr on three point cloud tasks such as shape classification, part segmentation, and scene segmentation. In particular, our proposed model achieves an accuracy gain of 2.6% over the previous best models on shape classification with ScanObjectNN. We also provide qualitative analyses to demonstrate the interpretability of self-positioning points. The code of SPoTr is available at https://github.com/mlvlab/SPoTr.

Results

TaskDatasetMetricValueModel
Semantic SegmentationS3DIS Area5mAcc76.4SPoTr
Semantic SegmentationS3DIS Area5mIoU70.8SPoTr
Semantic SegmentationS3DIS Area5oAcc90.7SPoTr
Semantic SegmentationShapeNet-PartClass Average IoU85.4SPoTr
Semantic SegmentationShapeNet-PartInstance Average IoU87.2SPoTr
Shape Representation Of 3D Point CloudsScanObjectNNMean Accuracy86.8SPoTr
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy88.6SPoTr
Shape Representation Of 3D Point CloudsScanObjectNNGFLOPs10.8SPoTr
Shape Representation Of 3D Point CloudsScanObjectNNNumber of params (M)1.7SPoTr
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy (PB_T50_RS)88.6SPoTr
3D Point Cloud ClassificationScanObjectNNMean Accuracy86.8SPoTr
3D Point Cloud ClassificationScanObjectNNOverall Accuracy88.6SPoTr
3D Point Cloud ClassificationScanObjectNNGFLOPs10.8SPoTr
3D Point Cloud ClassificationScanObjectNNNumber of params (M)1.7SPoTr
3D Point Cloud ClassificationScanObjectNNOverall Accuracy (PB_T50_RS)88.6SPoTr
10-shot image generationS3DIS Area5mAcc76.4SPoTr
10-shot image generationS3DIS Area5mIoU70.8SPoTr
10-shot image generationS3DIS Area5oAcc90.7SPoTr
10-shot image generationShapeNet-PartClass Average IoU85.4SPoTr
10-shot image generationShapeNet-PartInstance Average IoU87.2SPoTr
3D Point Cloud ReconstructionScanObjectNNMean Accuracy86.8SPoTr
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy88.6SPoTr
3D Point Cloud ReconstructionScanObjectNNGFLOPs10.8SPoTr
3D Point Cloud ReconstructionScanObjectNNNumber of params (M)1.7SPoTr
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy (PB_T50_RS)88.6SPoTr

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15