Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

2022-10-11Semantic Segmentation Point Cloud Segmentation 3D Semantic Segmentation 3D Point Cloud Classification LIDAR Semantic Segmentation Point Cloud Classification

Paper PDF Code(official)Code(official)

Abstract

As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ScanNet	test mIoU	75.2	PTv2
Semantic Segmentation	ScanNet	val mIoU	75.4	PTv2
Semantic Segmentation	S3DIS Area5	mAcc	78	PTv2
Semantic Segmentation	S3DIS Area5	mIoU	72.6	PTv2
Semantic Segmentation	S3DIS Area5	oAcc	91.6	PTv2
Semantic Segmentation	ScanNet++	Top-1 IoU	0.445	PTv2
Semantic Segmentation	ScanNet++	Top-3 IoU	0.688	PTv2
Semantic Segmentation	S3DIS	mIoU (Area-5)	71.6	PointTransformerV2
Shape Representation Of 3D Point Clouds	ModelNet40	Mean Accuracy	91.6	PTv2
Shape Representation Of 3D Point Clouds	ModelNet40	Overall Accuracy	94.2	PTv2
3D Semantic Segmentation	ScanNet++	Top-1 IoU	0.445	PTv2
3D Semantic Segmentation	ScanNet++	Top-3 IoU	0.688	PTv2
3D Semantic Segmentation	S3DIS	mIoU (Area-5)	71.6	PointTransformerV2
3D Point Cloud Classification	ModelNet40	Mean Accuracy	91.6	PTv2
3D Point Cloud Classification	ModelNet40	Overall Accuracy	94.2	PTv2
LIDAR Semantic Segmentation	nuScenes	test mIoU	0.826	PTv2
LIDAR Semantic Segmentation	nuScenes	val mIoU	0.802	PTv2
10-shot image generation	ScanNet	test mIoU	75.2	PTv2
10-shot image generation	ScanNet	val mIoU	75.4	PTv2
10-shot image generation	S3DIS Area5	mAcc	78	PTv2
10-shot image generation	S3DIS Area5	mIoU	72.6	PTv2
10-shot image generation	S3DIS Area5	oAcc	91.6	PTv2
10-shot image generation	ScanNet++	Top-1 IoU	0.445	PTv2
10-shot image generation	ScanNet++	Top-3 IoU	0.688	PTv2
10-shot image generation	S3DIS	mIoU (Area-5)	71.6	PointTransformerV2
3D Point Cloud Reconstruction	ModelNet40	Mean Accuracy	91.6	PTv2
3D Point Cloud Reconstruction	ModelNet40	Overall Accuracy	94.2	PTv2

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Abstract

Results

Related Papers

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Abstract

Results

Related Papers