TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-...

Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models

Ivan Kapelyukh, Yifei Ren, Ignacio Alzugaray, Edward Johns

2023-12-07Object Rearrangement
PaperPDF

Abstract

We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline. This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place. This enables language-conditioned rearrangement to be performed zero-shot, without needing to collect a training dataset of example arrangements. Results on a series of real-world tasks show that this framework is robust to distractors, controllable by language, capable of understanding complex multi-object relations, and readily applicable to both tabletop and 6-DoF rearrangement tasks.

Results

TaskDatasetMetricValueModel
Object RearrangementOpen6DOR V26-DoF13.5Dream2Real
Object RearrangementOpen6DOR V2pos-level011Dream2Real
Object RearrangementOpen6DOR V2pos-level117.2Dream2Real
Object RearrangementOpen6DOR V2rot-level037.3Dream2Real
Object RearrangementOpen6DOR V2rot-level127.6Dream2Real
Object RearrangementOpen6DOR V2rot-level226.2Dream2Real

Related Papers

Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs2025-06-03Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance2025-05-22PARSEC: Preference Adaptation for Robotic Object Rearrangement from Scene Context2025-05-16LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement2025-04-14Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation2025-04-13Learning 3D Scene Analogies with Neural Contextual Scene Maps2025-03-20SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation2025-02-18SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs2025-01-01