Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

2023-09-01Question Answering Instruction Following 3D Generation 3D Question Answering (3D-QA)parameter-efficient fine-tuning Generative 3D Object Classification Large Language Model Language Modelling

Paper PDF Code Code Code(official)Code Code

Abstract

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large language model (LLM) following 3D multi-modal instructions. By parameter-efficient fine-tuning techniques, Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi-modal question-answering capacity. We hope our work may cast a light on the community for extending 3D point clouds to multi-modality applications. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM.

Results

Task	Dataset	Metric	Value	Model
Visual Question Answering (VQA)	3D MM-Vet	Overall Accuracy	23.5	Point-Bind & Point-LLM
3D	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
3D	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
3D	Objaverse	Objaverse (I)	6	Point-Bind LLM
3D	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
Shape Representation Of 3D Point Clouds	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
Shape Representation Of 3D Point Clouds	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
Shape Representation Of 3D Point Clouds	Objaverse	Objaverse (I)	6	Point-Bind LLM
Shape Representation Of 3D Point Clouds	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
3D Object Classification	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
3D Object Classification	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
3D Object Classification	Objaverse	Objaverse (I)	6	Point-Bind LLM
3D Object Classification	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
3D Point Cloud Classification	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
3D Point Cloud Classification	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
3D Point Cloud Classification	Objaverse	Objaverse (I)	6	Point-Bind LLM
3D Point Cloud Classification	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
3D Classification	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
3D Classification	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
3D Classification	Objaverse	Objaverse (I)	6	Point-Bind LLM
3D Classification	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
3D Point Cloud Reconstruction	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
3D Point Cloud Reconstruction	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
3D Point Cloud Reconstruction	Objaverse	Objaverse (I)	6	Point-Bind LLM
3D Point Cloud Reconstruction	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM
Generative 3D Object Classification	Objaverse	Objaverse (Average)	5.25	Point-Bind LLM
Generative 3D Object Classification	Objaverse	Objaverse (C)	4.5	Point-Bind LLM
Generative 3D Object Classification	Objaverse	Objaverse (I)	6	Point-Bind LLM
Generative 3D Object Classification	ModelNet40	ModelNet40 (Average)	45.81	Point-Bind LLM

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Abstract

Results

Related Papers

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Abstract

Results

Related Papers