TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3D Diffuser Actor: Policy Diffusion with 3D Scene Represen...

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki

2024-02-16DenoisingRobot Manipulation
PaperPDFCode

Abstract

Diffusion policies are conditional diffusion models that learn robot action distributions conditioned on the robot and environment state. They have recently shown to outperform both deterministic and alternative action distribution learning formulations. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy equipped with a novel 3D denoising transformer that fuses information from the 3D visual scene, a language instruction and proprioception to predict the noise in noised 3D robot pose trajectories. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 18.1% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it improves over the current SOTA by a 9% relative increase. It also learns to control a robot manipulator in the real world from a handful of demonstrations. Through thorough comparisons with the current SOTA policies and ablations of our model, we show 3D Diffuser Actor's design choices dramatically outperform 2D representations, regression and classification objectives, absolute attentions, and holistic non-tokenized 3D scene embeddings.

Results

TaskDatasetMetricValueModel
Robot ManipulationCALVINavg. sequence length (D to D)3.353DDA
Robot ManipulationCALVINavg. sequence length (D to D)3.273D Diffusor Actor
Robot ManipulationRLBenchInput Image Size2563D Diffuser Actor
Robot ManipulationRLBenchSucc. Rate (18 tasks, 100 demo/task)81.33D Diffuser Actor
Robot ManipulationRLBenchTraining Time (A100 x hour)9363D Diffuser Actor
Robot ManipulationRLBenchTraining Time (V100 x 8 x day)83D Diffuser Actor
Robot ManipulationGEMBenchAverage Success Rate443D diffuser actor
Robot ManipulationThe COLOSSEUMAverage decrease average across all perturbations-15.63D Diffuser Actor
Zero-shot GeneralizationCALVINAvg. sequence length3.273D Diffuser Actor

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15A statistical physics framework for optimal learning2025-07-10LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08Unconditional Diffusion for Generative Sequential Recommendation2025-07-08