TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Point-Bind & Point-LLM: Aligning Point Cloud with Multi-mo...

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

2023-09-01Question AnsweringInstruction Following3D Generation3D Question Answering (3D-QA)parameter-efficient fine-tuningGenerative 3D Object ClassificationLarge Language ModelLanguage Modelling
PaperPDFCodeCodeCode(official)CodeCode

Abstract

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this, we further present Point-LLM, the first 3D large language model (LLM) following 3D multi-modal instructions. By parameter-efficient fine-tuning techniques, Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi-modal question-answering capacity. We hope our work may cast a light on the community for extending 3D point clouds to multi-modality applications. Code is available at https://github.com/ZiyuGuo99/Point-Bind_Point-LLM.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)3D MM-VetOverall Accuracy23.5Point-Bind & Point-LLM
3DObjaverseObjaverse (Average)5.25Point-Bind LLM
3DObjaverseObjaverse (C)4.5Point-Bind LLM
3DObjaverseObjaverse (I)6Point-Bind LLM
3DModelNet40ModelNet40 (Average)45.81Point-Bind LLM
Shape Representation Of 3D Point CloudsObjaverseObjaverse (Average)5.25Point-Bind LLM
Shape Representation Of 3D Point CloudsObjaverseObjaverse (C)4.5Point-Bind LLM
Shape Representation Of 3D Point CloudsObjaverseObjaverse (I)6Point-Bind LLM
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (Average)45.81Point-Bind LLM
3D Object ClassificationObjaverseObjaverse (Average)5.25Point-Bind LLM
3D Object ClassificationObjaverseObjaverse (C)4.5Point-Bind LLM
3D Object ClassificationObjaverseObjaverse (I)6Point-Bind LLM
3D Object ClassificationModelNet40ModelNet40 (Average)45.81Point-Bind LLM
3D Point Cloud ClassificationObjaverseObjaverse (Average)5.25Point-Bind LLM
3D Point Cloud ClassificationObjaverseObjaverse (C)4.5Point-Bind LLM
3D Point Cloud ClassificationObjaverseObjaverse (I)6Point-Bind LLM
3D Point Cloud ClassificationModelNet40ModelNet40 (Average)45.81Point-Bind LLM
3D ClassificationObjaverseObjaverse (Average)5.25Point-Bind LLM
3D ClassificationObjaverseObjaverse (C)4.5Point-Bind LLM
3D ClassificationObjaverseObjaverse (I)6Point-Bind LLM
3D ClassificationModelNet40ModelNet40 (Average)45.81Point-Bind LLM
3D Point Cloud ReconstructionObjaverseObjaverse (Average)5.25Point-Bind LLM
3D Point Cloud ReconstructionObjaverseObjaverse (C)4.5Point-Bind LLM
3D Point Cloud ReconstructionObjaverseObjaverse (I)6Point-Bind LLM
3D Point Cloud ReconstructionModelNet40ModelNet40 (Average)45.81Point-Bind LLM
Generative 3D Object ClassificationObjaverseObjaverse (Average)5.25Point-Bind LLM
Generative 3D Object ClassificationObjaverseObjaverse (C)4.5Point-Bind LLM
Generative 3D Object ClassificationObjaverseObjaverse (I)6Point-Bind LLM
Generative 3D Object ClassificationModelNet40ModelNet40 (Average)45.81Point-Bind LLM

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning2025-07-17AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17