TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/MuZero

MuZero

Reinforcement LearningIntroduced 200046 papers
Source Paper

Description

MuZero is a model-based reinforcement learning algorithm. It builds upon AlphaZero's search and search-based policy iteration algorithms, but incorporates a learned model into the training procedure.

The main idea of the algorithm is to predict those aspects of the future that are directly relevant for planning. The model receives the observation (e.g. an image of the Go board or the Atari screen) as an input and transforms it into a hidden state. The hidden state is then updated iteratively by a recurrent process that receives the previous hidden state and a hypothetical next action. At every one of these steps the model predicts the policy (e.g. the move to play), value function (e.g. the predicted winner), and immediate reward (e.g. the points scored by playing a move). The model is trained end-to-end, with the sole objective of accurately estimating these three important quantities, so as to match the improved estimates of policy and value generated by search as well as the observed reward.

There is no direct constraint or requirement for the hidden state to capture all information necessary to reconstruct the original observation, drastically reducing the amount of information the model has to maintain and predict; nor is there any requirement for the hidden state to match the unknown, true state of the environment; nor any other constraints on the semantics of state. Instead, the hidden states are free to represent state in whatever way is relevant to predicting current and future values and policies. Intuitively, the agent can invent, internally, the rules or dynamics that lead to most accurate planning.

Papers Using This Method

Calibrated Value-Aware Model Learning with Stochastic Environment Models2025-05-28OptionZero: Planning with Learned Options2025-02-23Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations2025-02-14Evaluating World Models with LLM for Decision Making2024-11-13Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping2024-11-07Interpreting the Learned Model in MuZero Planning2024-11-07Combining AI Control Systems and Human Decision Support via Robustness and Criticality2024-07-03Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction2024-06-02Efficient Multi-agent Reinforcement Learning by Planning2024-05-20ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze2024-04-25MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games2023-10-17Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction2023-10-10Self-Predictive Universal AI2023-09-21AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning2023-08-07$λ$-models: Effective Decision-Aware Reinforcement Learning with Latent Models2023-06-30What model does MuZero learn?2023-06-01Model Predictive Control with Self-supervised Representation Learning2023-04-14Equivariant MuZero2023-02-09Epistemic Monte Carlo Tree Search2022-10-21Efficient Offline Policy Optimization with a Learned Model2022-10-12