TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning and Planning in Complex Action Spaces

Learning and Planning in Complex Action Spaces

Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver

2021-04-13Game of GoContinuous Control
PaperPDFCode

Abstract

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Results

TaskDatasetMetricValueModel
Continuous Controlwalker.walkReturn975.46SMuZero
Continuous Controlwalker.standReturn987.79SMuZero
Continuous Controlhopper.hopReturn528.24SMuZero
Continuous Controlhopper.standReturn926.5SMuZero
Continuous Controlwalker.runReturn931.06SMuZero
Continuous Controlcheetah.runReturn914.39SMuZero
Continuous Controlcartpole.balance_sparseReturn998.14SMuZero
Continuous Controlcartpole.swingupReturn868.87SMuZero
Continuous Controlquadruped.walkReturn933.77SMuZero
Continuous Controlball_in_cup.catchReturn977.38SMuZero
Continuous Controlreacher.easyReturn982.26SMuZero
Continuous Controlreacher.hardReturn971.53SMuZero
Continuous Controlfinger.turn_hardReturn963.07SMuZero
Continuous Controlquadruped.runReturn923.54SMuZero
Continuous Controlpendulum.swingupReturn837.76SMuZero
Continuous Controlcartpole.swingup_sparseReturn846.91SMuZero
Continuous Controlfinger.turn_easyReturn972.53SMuZero
Continuous Controlfinger.spinReturn986.38SMuZero
Continuous Controlcartpole.balanceReturn984.86SMuZero
Continuous Controlacrobot.swingupReturn417.52SMuZero
3Dwalker.walkReturn975.46SMuZero
3Dwalker.standReturn987.79SMuZero
3Dhopper.hopReturn528.24SMuZero
3Dhopper.standReturn926.5SMuZero
3Dwalker.runReturn931.06SMuZero
3Dcheetah.runReturn914.39SMuZero
3Dcartpole.balance_sparseReturn998.14SMuZero
3Dcartpole.swingupReturn868.87SMuZero
3Dquadruped.walkReturn933.77SMuZero
3Dball_in_cup.catchReturn977.38SMuZero
3Dreacher.easyReturn982.26SMuZero
3Dreacher.hardReturn971.53SMuZero
3Dfinger.turn_hardReturn963.07SMuZero
3Dquadruped.runReturn923.54SMuZero
3Dpendulum.swingupReturn837.76SMuZero
3Dcartpole.swingup_sparseReturn846.91SMuZero
3Dfinger.turn_easyReturn972.53SMuZero
3Dfinger.spinReturn986.38SMuZero
3Dcartpole.balanceReturn984.86SMuZero
3Dacrobot.swingupReturn417.52SMuZero
3D Face Modellingwalker.walkReturn975.46SMuZero
3D Face Modellingwalker.standReturn987.79SMuZero
3D Face Modellinghopper.hopReturn528.24SMuZero
3D Face Modellinghopper.standReturn926.5SMuZero
3D Face Modellingwalker.runReturn931.06SMuZero
3D Face Modellingcheetah.runReturn914.39SMuZero
3D Face Modellingcartpole.balance_sparseReturn998.14SMuZero
3D Face Modellingcartpole.swingupReturn868.87SMuZero
3D Face Modellingquadruped.walkReturn933.77SMuZero
3D Face Modellingball_in_cup.catchReturn977.38SMuZero
3D Face Modellingreacher.easyReturn982.26SMuZero
3D Face Modellingreacher.hardReturn971.53SMuZero
3D Face Modellingfinger.turn_hardReturn963.07SMuZero
3D Face Modellingquadruped.runReturn923.54SMuZero
3D Face Modellingpendulum.swingupReturn837.76SMuZero
3D Face Modellingcartpole.swingup_sparseReturn846.91SMuZero
3D Face Modellingfinger.turn_easyReturn972.53SMuZero
3D Face Modellingfinger.spinReturn986.38SMuZero
3D Face Modellingcartpole.balanceReturn984.86SMuZero
3D Face Modellingacrobot.swingupReturn417.52SMuZero

Related Papers

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17rQdia: Regularizing Q-Value Distributions With Image Augmentation2025-06-26Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity2025-06-20Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute2025-06-18Scaling Algorithm Distillation for Continuous Control with Mamba2025-06-16DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty2025-06-14Wasserstein Barycenter Soft Actor-Critic2025-06-11Reinforcement Learning via Implicit Imitation Guidance2025-06-09