Point Transformer V3: Simpler, Faster, Stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao

2023-12-15Representation Learning Semantic Segmentation 3D Semantic Segmentation LIDAR Semantic Segmentation

Paper PDF Code(official)Code(official)Code

Abstract

This paper is not motivated to seek innovation within the attention mechanism. Instead, it focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing, leveraging the power of scale. Drawing inspiration from recent advances in 3D large-scale representation learning, we recognize that model performance is more influenced by scale than by intricate design. Therefore, we present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms that are minor to the overall performance after scaling, such as replacing the precise neighbor search by KNN with an efficient serialized neighbor mapping of point clouds organized with specific patterns. This principle enables significant scaling, expanding the receptive field from 16 to 1024 points while remaining efficient (a 3x increase in processing speed and a 10x improvement in memory efficiency compared with its predecessor, PTv2). PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios. Further enhanced with multi-dataset joint training, PTv3 pushes these results to a higher level.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ScanNet	test mIoU	79.4	PTv3 + PPT
Semantic Segmentation	ScanNet	val mIoU	78.6	PTv3 + PPT
Semantic Segmentation	S3DIS Area5	mAcc	80.1	PTv3 + PPT
Semantic Segmentation	S3DIS Area5	mIoU	74.7	PTv3 + PPT
Semantic Segmentation	S3DIS Area5	oAcc	92	PTv3 + PPT
Semantic Segmentation	S3DIS	Mean IoU	80.8	PTv3 + PPT
Semantic Segmentation	S3DIS	mAcc	87.7	PTv3 + PPT
Semantic Segmentation	S3DIS	oAcc	92.6	PTv3 + PPT
Semantic Segmentation	ScanNet200	test mIoU	39.3	PTv3 + PPT
Semantic Segmentation	ScanNet200	val mIoU	36	PTv3 + PPT
Semantic Segmentation	ScanNet++	Top-1 IoU	0.488	PTv3
Semantic Segmentation	ScanNet++	Top-3 IoU	0.725	PTv3
3D Semantic Segmentation	ScanNet200	test mIoU	39.3	PTv3 + PPT
3D Semantic Segmentation	ScanNet200	val mIoU	36	PTv3 + PPT
3D Semantic Segmentation	ScanNet++	Top-1 IoU	0.488	PTv3
3D Semantic Segmentation	ScanNet++	Top-3 IoU	0.725	PTv3
LIDAR Semantic Segmentation	nuScenes	test mIoU	0.83	PTv3 + PPT
LIDAR Semantic Segmentation	nuScenes	val mIoU	0.812	PTv3 + PPT
10-shot image generation	ScanNet	test mIoU	79.4	PTv3 + PPT
10-shot image generation	ScanNet	val mIoU	78.6	PTv3 + PPT
10-shot image generation	S3DIS Area5	mAcc	80.1	PTv3 + PPT
10-shot image generation	S3DIS Area5	mIoU	74.7	PTv3 + PPT
10-shot image generation	S3DIS Area5	oAcc	92	PTv3 + PPT
10-shot image generation	S3DIS	Mean IoU	80.8	PTv3 + PPT
10-shot image generation	S3DIS	mAcc	87.7	PTv3 + PPT
10-shot image generation	S3DIS	oAcc	92.6	PTv3 + PPT
10-shot image generation	ScanNet200	test mIoU	39.3	PTv3 + PPT
10-shot image generation	ScanNet200	val mIoU	36	PTv3 + PPT
10-shot image generation	ScanNet++	Top-1 IoU	0.488	PTv3
10-shot image generation	ScanNet++	Top-3 IoU	0.725	PTv3

Point Transformer V3: Simpler, Faster, Stronger

Abstract

Results

Related Papers

Point Transformer V3: Simpler, Faster, Stronger

Abstract

Results

Related Papers