PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

2022-11-21ICCV 2023 1Zero-shot 3D classification Zero-shot 3D Point Cloud Classification Training-free 3D Part Segmentation 3D Classification Descriptive Zero-Shot Transfer 3D Point Cloud Classification 3D Open-Vocabulary Instance Segmentation Open Vocabulary Object Detection Classification 3D Part Segmentation object-detection 3D Object Detection Object Detection Training-free 3D Point Cloud Classification

Paper PDF Code Code(official)

Abstract

Large-scale pre-trained models have shown promising open-world performance for both vision and language tasks. However, their transferred capacity on 3D point clouds is still limited and only constrained to the classification task. In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection. To better align 3D data with the pre-trained language knowledge, PointCLIP V2 contains two key designs. For the visual end, we prompt CLIP via a shape projection module to generate more realistic depth maps, narrowing the domain gap between projected point clouds with natural images. For the textual end, we prompt the GPT model to generate 3D-specific text as the input of CLIP's textual encoder. Without any training in 3D domains, our approach significantly surpasses PointCLIP by +42.90%, +40.44%, and +28.75% accuracy on three datasets for zero-shot 3D classification. On top of that, V2 can be extended to few-shot 3D classification, zero-shot 3D part segmentation, and 3D object detection in a simple manner, demonstrating our generalization ability for unified 3D open-world learning.

Results

Task	Dataset	Metric	Value	Model
Shape Representation Of 3D Point Clouds	ScanObjectNN	OBJ_BG Accuracy(%)	41.22	PointCLIP V2
Shape Representation Of 3D Point Clouds	ScanObjectNN	OBJ_ONLY Accuracy(%)	50.09	PointCLIP V2
Shape Representation Of 3D Point Clouds	ScanObjectNN	PB_T50_RS Accuracy (%)	35.36	PointCLIP V2
Shape Representation Of 3D Point Clouds	ModelNet40	Accuracy (%)	64.22	PointCLIP V2
Shape Representation Of 3D Point Clouds	ModelNet10	Accuracy (%)	73.13	PointCLIP V2
3D Point Cloud Classification	ScanObjectNN	OBJ_BG Accuracy(%)	41.22	PointCLIP V2
3D Point Cloud Classification	ScanObjectNN	OBJ_ONLY Accuracy(%)	50.09	PointCLIP V2
3D Point Cloud Classification	ScanObjectNN	PB_T50_RS Accuracy (%)	35.36	PointCLIP V2
3D Point Cloud Classification	ModelNet40	Accuracy (%)	64.22	PointCLIP V2
3D Point Cloud Classification	ModelNet10	Accuracy (%)	73.13	PointCLIP V2
Training-free 3D Point Cloud Classification	ModelNet40	Accuracy (%)	64.2	PointCLIP V2
Training-free 3D Point Cloud Classification	ScanObjectNN	Accuracy (%)	35.4	PointCLIP V2
Training-free 3D Part Segmentation	ShapeNet-Part	mIoU	48.4	PointCLIP V2
3D Open-Vocabulary Instance Segmentation	STPLS3D	AP50	3.1	PointCLIPV2
3D Point Cloud Reconstruction	ScanObjectNN	OBJ_BG Accuracy(%)	41.22	PointCLIP V2
3D Point Cloud Reconstruction	ScanObjectNN	OBJ_ONLY Accuracy(%)	50.09	PointCLIP V2
3D Point Cloud Reconstruction	ScanObjectNN	PB_T50_RS Accuracy (%)	35.36	PointCLIP V2
3D Point Cloud Reconstruction	ModelNet40	Accuracy (%)	64.22	PointCLIP V2
3D Point Cloud Reconstruction	ModelNet10	Accuracy (%)	73.13	PointCLIP V2

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

Abstract

Results

Related Papers

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

Abstract

Results

Related Papers