TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mastering Atari, Go, Chess and Shogi by Planning with a Le...

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

2019-11-19Atari Games 100kGame of GoAtari Games
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Results

TaskDatasetMetricValueModel
Atari Gamesatari gameHuman World Record Breakthrough19Muzero
Atari GamesAtari 2600 BoxingScore100MuZero
Atari GamesAtari 2600 SkiingScore-29968.36MuZero
Atari GamesAtari 2600 Double DunkScore23.94MuZero
Atari GamesAtari 2600 Ms. PacmanScore243401.1MuZero
Atari GamesAtari 2600 CentipedeScore1159049.27MuZero
Atari GamesAtari 2600 TutankhamScore491.48MuZero
Atari GamesAtari 2600 FreewayScore33.03MuZero
Atari GamesAtari 2600 PongScore21MuZero
Atari GamesAtari 2600 EnduroScore2382.44MuZero
Atari GamesAtari 2600 KrullScore269358.27MuZero
Atari GamesAtari 2600 FrostbiteScore631378.53MuZero
Atari GamesAtari 2600 Yars RevengeScore553311.46MuZero
Atari GamesAtari 2600 GopherScore130345.58MuZero
Atari GamesAtari 2600 Space InvadersScore74335.3MuZero
Atari GamesAtari 2600 James BondScore41063.25MuZero
Atari GamesAtari 2600 AmidarScore28634.39MuZero
Atari GamesAtari 2600 Crazy ClimberScore458315.4MuZero
Atari GamesAtari 2600 AsteroidsScore678558.64MuZero
Atari GamesAtari 2600 GravitarScore6682.7MuZero
Atari GamesAtari 2600 Time PilotScore476763.9MuZero
Atari GamesAtari 2600 Demon AttackScore143964.26MuZero
Atari GamesAtari 2600 Battle ZoneScore848623MuZero
Atari GamesAtari 2600 PhoenixScore955137.84MuZero
Atari GamesAtari 2600 Beam RiderScore454993.53MuZero
Atari GamesAtari 2600 AsterixScore998425MuZero
Atari GamesAtari 2600 Kung-Fu MasterScore204824MuZero
Atari GamesAtari 2600 BowlingScore260.13MuZero
Atari GamesAtari 2600 KangarooScore16763.6MuZero
Atari GamesAtari 2600 AssaultScore143972.03MuZero
Atari GamesAtari 2600 AlienScore741812.63MuZero
Atari GamesAtari 2600 Fishing DerbyScore91.16MuZero
Atari GamesAtari 2600 SeaquestScore999976.52MuZero
Atari GamesAtari 2600 Chopper CommandScore991039.7MuZero
Atari GamesAtari-57Human World Record Breakthrough19MuZero
Atari GamesAtari 2600 SolarisScore56.62MuZero
Atari GamesAtari 2600 SurroundScore9.99MuZero
Atari GamesAtari 2600 Video PinballScore981791.88MuZero
Atari GamesAtari 2600 Wizard of WorScore197126MuZero
Atari GamesAtari 2600 ZaxxonScore725853.9MuZero
Atari GamesAtari 2600 DefenderScore839642.95MuZero
Atari GamesAtari 2600 RobotankScore131.13MuZero
Atari GamesAtari 2600 Name This GameScore157177.85MuZero
Atari GamesAtari 2600 Star GunnerScore549271.7MuZero
Atari GamesAtari 2600 Ice HockeyScore67.04MuZero
Atari GamesAtari 2600 BerzerkScore85932.6MuZero
Atari GamesAtari 2600 AtlantisScore1674767.2MuZero
Atari GamesAtari 2600 HEROScore49244.11MuZero
Atari GamesAtari 2600 Bank HeistScore1278.98MuZero
Atari GamesAtari 2600 VentureScore0.4MuZero
Atari GamesAtari 2600 Private EyeScore15299.98MuZero
Atari GamesAtari 2600 Q*BertScore72276MuZero
Atari GamesAtari 2600 River RaidScore323417.18MuZero
Atari GamesAtari 2600 Road RunnerScore613411.8MuZero
Atari GamesAtari 2600 Up and DownScore715545.61MuZero
Video Gamesatari gameHuman World Record Breakthrough19Muzero
Video GamesAtari 2600 BoxingScore100MuZero
Video GamesAtari 2600 SkiingScore-29968.36MuZero
Video GamesAtari 2600 Double DunkScore23.94MuZero
Video GamesAtari 2600 Ms. PacmanScore243401.1MuZero
Video GamesAtari 2600 CentipedeScore1159049.27MuZero
Video GamesAtari 2600 TutankhamScore491.48MuZero
Video GamesAtari 2600 FreewayScore33.03MuZero
Video GamesAtari 2600 PongScore21MuZero
Video GamesAtari 2600 EnduroScore2382.44MuZero
Video GamesAtari 2600 KrullScore269358.27MuZero
Video GamesAtari 2600 FrostbiteScore631378.53MuZero
Video GamesAtari 2600 Yars RevengeScore553311.46MuZero
Video GamesAtari 2600 GopherScore130345.58MuZero
Video GamesAtari 2600 Space InvadersScore74335.3MuZero
Video GamesAtari 2600 James BondScore41063.25MuZero
Video GamesAtari 2600 AmidarScore28634.39MuZero
Video GamesAtari 2600 Crazy ClimberScore458315.4MuZero
Video GamesAtari 2600 AsteroidsScore678558.64MuZero
Video GamesAtari 2600 GravitarScore6682.7MuZero
Video GamesAtari 2600 Time PilotScore476763.9MuZero
Video GamesAtari 2600 Demon AttackScore143964.26MuZero
Video GamesAtari 2600 Battle ZoneScore848623MuZero
Video GamesAtari 2600 PhoenixScore955137.84MuZero
Video GamesAtari 2600 Beam RiderScore454993.53MuZero
Video GamesAtari 2600 AsterixScore998425MuZero
Video GamesAtari 2600 Kung-Fu MasterScore204824MuZero
Video GamesAtari 2600 BowlingScore260.13MuZero
Video GamesAtari 2600 KangarooScore16763.6MuZero
Video GamesAtari 2600 AssaultScore143972.03MuZero
Video GamesAtari 2600 AlienScore741812.63MuZero
Video GamesAtari 2600 Fishing DerbyScore91.16MuZero
Video GamesAtari 2600 SeaquestScore999976.52MuZero
Video GamesAtari 2600 Chopper CommandScore991039.7MuZero
Video GamesAtari-57Human World Record Breakthrough19MuZero
Video GamesAtari 2600 SolarisScore56.62MuZero
Video GamesAtari 2600 SurroundScore9.99MuZero
Video GamesAtari 2600 Video PinballScore981791.88MuZero
Video GamesAtari 2600 Wizard of WorScore197126MuZero
Video GamesAtari 2600 ZaxxonScore725853.9MuZero
Video GamesAtari 2600 DefenderScore839642.95MuZero
Video GamesAtari 2600 RobotankScore131.13MuZero
Video GamesAtari 2600 Name This GameScore157177.85MuZero
Video GamesAtari 2600 Star GunnerScore549271.7MuZero
Video GamesAtari 2600 Ice HockeyScore67.04MuZero
Video GamesAtari 2600 BerzerkScore85932.6MuZero
Video GamesAtari 2600 AtlantisScore1674767.2MuZero
Video GamesAtari 2600 HEROScore49244.11MuZero
Video GamesAtari 2600 Bank HeistScore1278.98MuZero
Video GamesAtari 2600 VentureScore0.4MuZero
Video GamesAtari 2600 Private EyeScore15299.98MuZero
Video GamesAtari 2600 Q*BertScore72276MuZero
Video GamesAtari 2600 River RaidScore323417.18MuZero
Video GamesAtari 2600 Road RunnerScore613411.8MuZero
Video GamesAtari 2600 Up and DownScore715545.61MuZero

Related Papers

Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains2025-07-02A Principled Path to Fitted Distributional Evaluation2025-06-24Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments2025-06-17Meta-learning how to Share Credit among Macro-Actions2025-06-16TextAtari: 100K Frames Game Playing with Language Agents2025-06-04Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons2025-06-03Automatic Reward Shaping from Confounded Offline Data2025-05-16Unraveling the Rainbow: can value-based methods schedule?2025-05-06