Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

Stephen James, Kentaro Wada, Tristan Laidlow, Andrew J. Davison

2021-06-23CVPR 2022 1Continuous Control Translation Robot Manipulation Q-Learning

Abstract

We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scene, coarse-to-fine Q-attention learns what part of the scene to 'zoom' into. When this 'zooming' behaviour is applied iteratively, it results in a near-lossless discretisation of the translation space, and allows the use of a discrete action, deep Q-learning method. We show that our new coarse-to-fine algorithm achieves state-of-the-art performance on several difficult sparsely rewarded RLBench vision-based robotics tasks, and can train real-world policies, tabula rasa, in a matter of minutes, with as little as 3 demonstrations.

Results

Task	Dataset	Metric	Value	Model
Robot Manipulation	RLBench	Input Image Size	128	C2FARM-BC (Evaluated in PerAct)
Robot Manipulation	RLBench	Succ. Rate (18 tasks, 100 demo/task)	20.1	C2FARM-BC (Evaluated in PerAct)

Related Papers

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour2025-07-17 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08