TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PointLLM: Empowering Large Language Models to Understand P...

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

2023-08-313D Object CaptioningCommon Sense Reasoning3D Question Answering (3D-QA)Generative 3D Object Classification3D Object Classification
PaperPDFCode(official)CodeCode

Abstract

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, enabling LLMs to understand point clouds and offering a new avenue beyond 2D visual data. PointLLM understands colored object point clouds with human instructions and generates contextually appropriate responses, illustrating its grasp of point clouds and common sense. Specifically, it leverages a point cloud encoder with a powerful LLM to effectively fuse geometric, appearance, and linguistic information. We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs to enable a two-stage training strategy: aligning latent spaces and subsequently instruction-tuning the unified model. To rigorously evaluate the perceptual and generalization capabilities of PointLLM, we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different methods, including human evaluation, GPT-4/ChatGPT evaluation, and traditional metrics. Experimental results reveal PointLLM's superior performance over existing 2D and 3D baselines, with a notable achievement in human-evaluated object captioning tasks where it surpasses human annotators in over 50% of the samples. Codes, datasets, and benchmarks are available at https://github.com/OpenRobotLab/PointLLM .

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)3D MM-VetOverall Accuracy46.6PointLLM-13B v1.2
Visual Question Answering (VQA)3D MM-VetOverall Accuracy41.2PointLLM-7B v1.2
3DObjaverseObjaverse (Average)54PointLLM-13B v1.2
3DObjaverseObjaverse (C)51.5PointLLM-13B v1.2
3DObjaverseObjaverse (I)56.5PointLLM-13B v1.2
3DObjaverseObjaverse (Average)53PointLLM-7B v1.2
3DObjaverseObjaverse (C)51PointLLM-7B v1.2
3DObjaverseObjaverse (I)55PointLLM-7B v1.2
3DModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
3DModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
3DModelNet40ModelNet40 (I)53PointLLM-13B v1.2
3DModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
3DModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
3DModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (Average)54PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (C)51.5PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (I)56.5PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (Average)53PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (C)51PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsObjaverseObjaverse (I)55PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (I)53PointLLM-13B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
3D Object ClassificationObjaverseObjaverse (Average)54PointLLM-13B v1.2
3D Object ClassificationObjaverseObjaverse (C)51.5PointLLM-13B v1.2
3D Object ClassificationObjaverseObjaverse (I)56.5PointLLM-13B v1.2
3D Object ClassificationObjaverseObjaverse (Average)53PointLLM-7B v1.2
3D Object ClassificationObjaverseObjaverse (C)51PointLLM-7B v1.2
3D Object ClassificationObjaverseObjaverse (I)55PointLLM-7B v1.2
3D Object ClassificationModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
3D Object ClassificationModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
3D Object ClassificationModelNet40ModelNet40 (I)53PointLLM-13B v1.2
3D Object ClassificationModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
3D Object ClassificationModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
3D Object ClassificationModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (Average)54PointLLM-13B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (C)51.5PointLLM-13B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (I)56.5PointLLM-13B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (Average)53PointLLM-7B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (C)51PointLLM-7B v1.2
3D Point Cloud ClassificationObjaverseObjaverse (I)55PointLLM-7B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (I)53PointLLM-13B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
3D Point Cloud ClassificationModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
3D ClassificationObjaverseObjaverse (Average)54PointLLM-13B v1.2
3D ClassificationObjaverseObjaverse (C)51.5PointLLM-13B v1.2
3D ClassificationObjaverseObjaverse (I)56.5PointLLM-13B v1.2
3D ClassificationObjaverseObjaverse (Average)53PointLLM-7B v1.2
3D ClassificationObjaverseObjaverse (C)51PointLLM-7B v1.2
3D ClassificationObjaverseObjaverse (I)55PointLLM-7B v1.2
3D ClassificationModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
3D ClassificationModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
3D ClassificationModelNet40ModelNet40 (I)53PointLLM-13B v1.2
3D ClassificationModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
3D ClassificationModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
3D ClassificationModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (Average)54PointLLM-13B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (C)51.5PointLLM-13B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (I)56.5PointLLM-13B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (Average)53PointLLM-7B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (C)51PointLLM-7B v1.2
3D Point Cloud ReconstructionObjaverseObjaverse (I)55PointLLM-7B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (I)53PointLLM-13B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
3D Point Cloud ReconstructionModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (Average)54PointLLM-13B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (C)51.5PointLLM-13B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (I)56.5PointLLM-13B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (Average)53PointLLM-7B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (C)51PointLLM-7B v1.2
Generative 3D Object ClassificationObjaverseObjaverse (I)55PointLLM-7B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (Average)52.78PointLLM-13B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (C)52.55PointLLM-13B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (I)53PointLLM-13B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (Average)52.63PointLLM-7B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (C)51.82PointLLM-7B v1.2
Generative 3D Object ClassificationModelNet40ModelNet40 (I)53.44PointLLM-7B v1.2
3D Object CaptioningObjaverse Sentence-BERT47.91PointLLM-13B V1.2
3D Object CaptioningObjaverseCorrectness3.1PointLLM-13B V1.2
3D Object CaptioningObjaverseGPT-448.15PointLLM-13B V1.2
3D Object CaptioningObjaverseHallucination0.84PointLLM-13B V1.2
3D Object CaptioningObjaversePrecision78.75PointLLM-13B V1.2
3D Object CaptioningObjaverseSimCSE49.12PointLLM-13B V1.2
3D Object CaptioningObjaverse Sentence-BERT47.47PointLLM-7B V1.2
3D Object CaptioningObjaverseCorrectness3.04PointLLM-7B V1.2
3D Object CaptioningObjaverseGPT-444.85PointLLM-7B V1.2
3D Object CaptioningObjaverseHallucination0.66PointLLM-7B V1.2
3D Object CaptioningObjaversePrecision82.14PointLLM-7B V1.2
3D Object CaptioningObjaverseSimCSE48.55PointLLM-7B V1.2

Related Papers

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits2025-06-11CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation2025-06-11Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search2025-06-08AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment2025-06-04ATLAS: Learning to Optimally Memorize the Context at Test Time2025-05-29Spatial Knowledge Graph-Guided Multimodal Synthesis2025-05-28