PointCLIP: Point Cloud Understanding by CLIP

Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

2021-12-04CVPR 2022 1Zero-shot 3D classification Few-Shot Learning Zero-shot 3D Point Cloud Classification Training-free 3D Part Segmentation Transfer Learning Zero-Shot Transfer 3D Point Cloud Classification 3D Open-Vocabulary Instance Segmentation Open Vocabulary Object Detection Training-free 3D Point Cloud Classification

Paper PDF Code(official)Code

Abstract

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point cloud and 3D category texts. Specifically, we encode a point cloud by projecting it into multi-view depth maps without rendering, and aggregate the view-wise zero-shot prediction to achieve knowledge transfer from 2D to 3D. On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D. By just fine-tuning the lightweight adapter in the few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the complementary property between PointCLIP and classical 3D-supervised networks. By simple ensembling, PointCLIP boosts baseline's performance and even surpasses state-of-the-art models. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding via CLIP under low resource cost and data regime. We conduct thorough experiments on widely-adopted ModelNet10, ModelNet40 and the challenging ScanObjectNN to demonstrate the effectiveness of PointCLIP. The code is released at https://github.com/ZrrSkywalker/PointCLIP.

Results

Task	Dataset	Metric	Value	Model
Shape Representation Of 3D Point Clouds	ScanObjectNN	OBJ_BG Accuracy(%)	21.34	PointCLIP
Shape Representation Of 3D Point Clouds	ScanObjectNN	OBJ_ONLY Accuracy(%)	19.28	PointCLIP
Shape Representation Of 3D Point Clouds	ScanObjectNN	PB_T50_RS Accuracy (%)	15.38	PointCLIP
Shape Representation Of 3D Point Clouds	ModelNet40	Accuracy (%)	20.18	PointCLIP
Shape Representation Of 3D Point Clouds	ModelNet10	Accuracy (%)	30.23	PointCLIP
3D Point Cloud Classification	ScanObjectNN	OBJ_BG Accuracy(%)	21.34	PointCLIP
3D Point Cloud Classification	ScanObjectNN	OBJ_ONLY Accuracy(%)	19.28	PointCLIP
3D Point Cloud Classification	ScanObjectNN	PB_T50_RS Accuracy (%)	15.38	PointCLIP
3D Point Cloud Classification	ModelNet40	Accuracy (%)	20.18	PointCLIP
3D Point Cloud Classification	ModelNet10	Accuracy (%)	30.23	PointCLIP
Training-free 3D Point Cloud Classification	ModelNet40	Accuracy (%)	20.2	PointCLIP
Training-free 3D Point Cloud Classification	ScanObjectNN	Accuracy (%)	15.4	PointCLIP
Training-free 3D Part Segmentation	ShapeNet-Part	mIoU	31	PointCLIP
3D Open-Vocabulary Instance Segmentation	STPLS3D	AP50	2.6	PointCLIP
3D Point Cloud Reconstruction	ScanObjectNN	OBJ_BG Accuracy(%)	21.34	PointCLIP
3D Point Cloud Reconstruction	ScanObjectNN	OBJ_ONLY Accuracy(%)	19.28	PointCLIP
3D Point Cloud Reconstruction	ScanObjectNN	PB_T50_RS Accuracy (%)	15.38	PointCLIP
3D Point Cloud Reconstruction	ModelNet40	Accuracy (%)	20.18	PointCLIP
3D Point Cloud Reconstruction	ModelNet10	Accuracy (%)	30.23	PointCLIP

PointCLIP: Point Cloud Understanding by CLIP

Abstract

Results

Related Papers

PointCLIP: Point Cloud Understanding by CLIP

Abstract

Results

Related Papers