A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

2022-05-12DeepMind 2022 5Skill Mastery Language Modelling Skill Generalization

Paper PDF Code Code Code

Abstract

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

Results

Task	Dataset	Metric	Value	Model
Skill Generalization	RGB-Stacking	Average	50.2	Gato
Skill Generalization	RGB-Stacking	Group 1	24.5	Gato
Skill Generalization	RGB-Stacking	Group 2	33	Gato
Skill Generalization	RGB-Stacking	Group 3	50.5	Gato
Skill Generalization	RGB-Stacking	Group 4	76.5	Gato
Skill Generalization	RGB-Stacking	Group 5	66.5	Gato
Skill Mastery	RGB-Stacking	Average	75.6	Gato
Skill Mastery	RGB-Stacking	Group 1	58	Gato
Skill Mastery	RGB-Stacking	Group 2	57.6	Gato
Skill Mastery	RGB-Stacking	Group 3	78.5	Gato
Skill Mastery	RGB-Stacking	Group 4	89	Gato
Skill Mastery	RGB-Stacking	Group 5	95.1	Gato

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16 InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16