TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Generalist Agent

A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

2022-05-12DeepMind 2022 5Skill MasteryLanguage ModellingSkill Generalization
PaperPDFCodeCodeCode

Abstract

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

Results

TaskDatasetMetricValueModel
Skill GeneralizationRGB-StackingAverage50.2Gato
Skill GeneralizationRGB-StackingGroup 124.5Gato
Skill GeneralizationRGB-StackingGroup 233Gato
Skill GeneralizationRGB-StackingGroup 350.5Gato
Skill GeneralizationRGB-StackingGroup 476.5Gato
Skill GeneralizationRGB-StackingGroup 566.5Gato
Skill MasteryRGB-StackingAverage75.6Gato
Skill MasteryRGB-StackingGroup 158Gato
Skill MasteryRGB-StackingGroup 257.6Gato
Skill MasteryRGB-StackingGroup 378.5Gato
Skill MasteryRGB-StackingGroup 489Gato
Skill MasteryRGB-StackingGroup 595.1Gato

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16