TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ShapeLLM: Universal 3D Object Understanding for Embodied I...

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma

2024-02-273D Object Captioning3D Point Cloud Linear ClassificationZero-shot 3D classification3D geometryInstruction FollowingVisual GroundingMultimodal Large Language ModelZero-Shot Transfer 3D Point Cloud Classification3D Question Answering (3D-QA)Generative 3D Object ClassificationLarge Language ModelFew-Shot 3D Point Cloud Classification3D Point Cloud ClassificationLanguage Modelling
PaperPDFCodeCode(official)Code

Abstract

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding. Project page: https://qizekun.github.io/shapellm/

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)3D MM-VetOverall Accuracy53.1ShapeLLM-13B
Visual Question Answering (VQA)3D MM-VetOverall Accuracy47.4ShapeLLM-7B
3DObjaverseObjaverse (Average)54.5ShapeLLM-7B
3DObjaverseObjaverse (Average)54ShapeLLM-13B
3DModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
3DModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-BG (OA)98.8ReCon++
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-ONLY (OA)97.59ReCon++
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy95.25ReCon++
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy95ReCon++
Shape Representation Of 3D Point CloudsObjaverseObjaverse (Average)54.5ShapeLLM-7B
Shape Representation Of 3D Point CloudsObjaverseObjaverse (Average)54ShapeLLM-13B
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
Shape Representation Of 3D Point CloudsModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Overall Accuracy96.5ReCon++
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Standard Deviation3ReCon++
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Overall Accuracy98ReCon++
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Standard Deviation2.3ReCon++
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Overall Accuracy94.5ReCon++
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Standard Deviation4.1ReCon++
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Overall Accuracy99.5ReCon++
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Standard Deviation0.8ReCon++
Shape Representation Of 3D Point CloudsScanObjectNNOBJ_ONLY Accuracy(%)65.4ReCon++
Shape Representation Of 3D Point CloudsModelNet40Accuracy (%)87.3ReCon++
3D Object ClassificationObjaverseObjaverse (Average)54.5ShapeLLM-7B
3D Object ClassificationObjaverseObjaverse (Average)54ShapeLLM-13B
3D Object ClassificationModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
3D Object ClassificationModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
3D Point Cloud ClassificationScanObjectNNOBJ-BG (OA)98.8ReCon++
3D Point Cloud ClassificationScanObjectNNOBJ-ONLY (OA)97.59ReCon++
3D Point Cloud ClassificationScanObjectNNOverall Accuracy95.25ReCon++
3D Point Cloud ClassificationModelNet40Overall Accuracy95ReCon++
3D Point Cloud ClassificationObjaverseObjaverse (Average)54.5ShapeLLM-7B
3D Point Cloud ClassificationObjaverseObjaverse (Average)54ShapeLLM-13B
3D Point Cloud ClassificationModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
3D Point Cloud ClassificationModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Overall Accuracy96.5ReCon++
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Standard Deviation3ReCon++
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Overall Accuracy98ReCon++
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Standard Deviation2.3ReCon++
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Overall Accuracy94.5ReCon++
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Standard Deviation4.1ReCon++
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Overall Accuracy99.5ReCon++
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Standard Deviation0.8ReCon++
3D Point Cloud ClassificationScanObjectNNOBJ_ONLY Accuracy(%)65.4ReCon++
3D Point Cloud ClassificationModelNet40Accuracy (%)87.3ReCon++
3D ClassificationObjaverseObjaverse (Average)54.5ShapeLLM-7B
3D ClassificationObjaverseObjaverse (Average)54ShapeLLM-13B
3D ClassificationModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
3D ClassificationModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
3D Point Cloud Linear ClassificationModelNet40Overall Accuracy93.6ReCon++
3D Point Cloud ReconstructionScanObjectNNOBJ-BG (OA)98.8ReCon++
3D Point Cloud ReconstructionScanObjectNNOBJ-ONLY (OA)97.59ReCon++
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy95.25ReCon++
3D Point Cloud ReconstructionModelNet40Overall Accuracy95ReCon++
3D Point Cloud ReconstructionObjaverseObjaverse (Average)54.5ShapeLLM-7B
3D Point Cloud ReconstructionObjaverseObjaverse (Average)54ShapeLLM-13B
3D Point Cloud ReconstructionModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
3D Point Cloud ReconstructionModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Overall Accuracy96.5ReCon++
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Standard Deviation3ReCon++
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Overall Accuracy98ReCon++
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Standard Deviation2.3ReCon++
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Overall Accuracy94.5ReCon++
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Standard Deviation4.1ReCon++
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Overall Accuracy99.5ReCon++
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Standard Deviation0.8ReCon++
3D Point Cloud ReconstructionScanObjectNNOBJ_ONLY Accuracy(%)65.4ReCon++
3D Point Cloud ReconstructionModelNet40Accuracy (%)87.3ReCon++
Generative 3D Object ClassificationObjaverseObjaverse (Average)54.5ShapeLLM-7B
Generative 3D Object ClassificationObjaverseObjaverse (Average)54ShapeLLM-13B
Generative 3D Object ClassificationModelNet40ModelNet40 (Average)53.08ShapeLLM-7B
Generative 3D Object ClassificationModelNet40ModelNet40 (Average)52.96ShapeLLM-13B
3D Object CaptioningObjaverse Sentence-BERT48.52ShapeLLM-13B
3D Object CaptioningObjaverseGPT-448.94ShapeLLM-13B
3D Object CaptioningObjaverseSimCSE49.98ShapeLLM-13B
3D Object CaptioningObjaverse Sentence-BERT48.2ShapeLLM-7B
3D Object CaptioningObjaverseGPT-446.92ShapeLLM-7B
3D Object CaptioningObjaverseSimCSE49.23ShapeLLM-7B

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning2025-07-17GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17