Muesli: Combining Improvements in Policy Optimization

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent SIfre, Theophane Weber, David Silver, Hado van Hasselt

2021-04-13Atari Games Continuous Control

Paper PDF Code Code

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Results

Task	Dataset	Metric	Value	Model
Atari Games	atari game	Human World Record Breakthrough	5	Muesli
Video Games	atari game	Human World Record Breakthrough	5	Muesli

Related Papers

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17 Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains2025-07-02 rQdia: Regularizing Q-Value Distributions With Image Augmentation2025-06-26 A Principled Path to Fitted Distributional Evaluation2025-06-24 Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity2025-06-20 Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute2025-06-18 Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments2025-06-17 Meta-learning how to Share Credit among Macro-Actions2025-06-16