Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao, Vikash Kumar, Sergey Levine, Chelsea Finn

2023-04-23Imitation Learning Robot Manipulation Chunking Robot Manipulation Generalization

Abstract

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/

Results

Task	Dataset	Metric	Value	Model
Robot Manipulation	MimicGen	Succ. Rate (12 tasks, 100 demo/task)	21.3	ACT (Evaluated in EquiDiff)
Robot Manipulation	MimicGen	Succ. Rate (12 tasks, 1000 demo/task)	63.3	ACT (Evaluated in EquiDiff)
Robot Manipulation	MimicGen	Succ. Rate (12 tasks, 200 demo/task)	38.2	ACT (Evaluated in EquiDiff)
Robot Manipulation	The COLOSSEUM	Average decrease average across all perturbations	-61.8	ACT

Related Papers

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner2025-07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17 Dynamic Chunking for End-to-End Hierarchical Sequence Modeling2025-07-10 CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs2025-07-09 EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow2025-07-08 Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model2025-07-08 LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving2025-07-08 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge2025-07-06