TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Instruction-driven history-aware policies for robotic mani...

Instruction-driven history-aware policies for robotic manipulations

Pierre-Louis Guhur, ShiZhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid

2022-09-11Robot ManipulationRobot Manipulation Generalization
PaperPDFCodeCode(official)

Abstract

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.

Results

TaskDatasetMetricValueModel
Robot ManipulationRLBenchSucc. Rate (10 tasks, 100 demos/task)83.3Hiveformer
Robot ManipulationRLBenchSucc. Rate (18 tasks, 100 demo/task)45.3Hiveformer
Robot ManipulationGEMBenchAverage Success Rate30.4Hiveformer

Related Papers

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge2025-07-06Geometry-aware 4D Video Generation for Robot Manipulation2025-07-01CapsDT: Diffusion-Transformer for Capsule Robot Manipulation2025-06-19Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation2025-06-18SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning2025-06-17What Matters in Learning from Large-Scale Datasets for Robot Manipulation2025-06-16Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success2025-06-12BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models2025-06-09