TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Point Transformer V2: Grouped Vector Attention and Partiti...

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

2022-10-11Semantic SegmentationPoint Cloud Segmentation3D Semantic Segmentation3D Point Cloud ClassificationLIDAR Semantic SegmentationPoint Cloud Classification
PaperPDFCode(official)Code(official)

Abstract

As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU75.2PTv2
Semantic SegmentationScanNetval mIoU75.4PTv2
Semantic SegmentationS3DIS Area5mAcc78PTv2
Semantic SegmentationS3DIS Area5mIoU72.6PTv2
Semantic SegmentationS3DIS Area5oAcc91.6PTv2
Semantic SegmentationScanNet++Top-1 IoU0.445PTv2
Semantic SegmentationScanNet++Top-3 IoU0.688PTv2
Semantic SegmentationS3DISmIoU (Area-5)71.6PointTransformerV2
Shape Representation Of 3D Point CloudsModelNet40Mean Accuracy91.6PTv2
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy94.2PTv2
3D Semantic SegmentationScanNet++Top-1 IoU0.445PTv2
3D Semantic SegmentationScanNet++Top-3 IoU0.688PTv2
3D Semantic SegmentationS3DISmIoU (Area-5)71.6PointTransformerV2
3D Point Cloud ClassificationModelNet40Mean Accuracy91.6PTv2
3D Point Cloud ClassificationModelNet40Overall Accuracy94.2PTv2
LIDAR Semantic SegmentationnuScenestest mIoU0.826PTv2
LIDAR Semantic SegmentationnuScenesval mIoU0.802PTv2
10-shot image generationScanNettest mIoU75.2PTv2
10-shot image generationScanNetval mIoU75.4PTv2
10-shot image generationS3DIS Area5mAcc78PTv2
10-shot image generationS3DIS Area5mIoU72.6PTv2
10-shot image generationS3DIS Area5oAcc91.6PTv2
10-shot image generationScanNet++Top-1 IoU0.445PTv2
10-shot image generationScanNet++Top-3 IoU0.688PTv2
10-shot image generationS3DISmIoU (Area-5)71.6PointTransformerV2
3D Point Cloud ReconstructionModelNet40Mean Accuracy91.6PTv2
3D Point Cloud ReconstructionModelNet40Overall Accuracy94.2PTv2

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15