TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mean Actor Critic

Mean Actor Critic

Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

2017-09-01Reinforcement LearningAtari Gamesreinforcement-learning
PaperPDFCodeCode

Abstract

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms.

Results

TaskDatasetMetricValueModel
Continuous ControlCart Pole (OpenAI Gym)Score178.3MAC
Continuous ControlLunar Lander (OpenAI Gym)Score163.5MAC
Atari GamesAtari 2600 PongScore10.6MAC
Atari GamesAtari 2600 BreakoutScore372.7MAC
Atari GamesAtari 2600 Space InvadersScore1173.1MAC
Atari GamesAtari 2600 Beam RiderScore6072MAC
Atari GamesAtari 2600 SeaquestScore1703.4MAC
Atari GamesAtari 2600 Q*BertScore243.4MAC
Video GamesAtari 2600 PongScore10.6MAC
Video GamesAtari 2600 BreakoutScore372.7MAC
Video GamesAtari 2600 Space InvadersScore1173.1MAC
Video GamesAtari 2600 Beam RiderScore6072MAC
Video GamesAtari 2600 SeaquestScore1703.4MAC
Video GamesAtari 2600 Q*BertScore243.4MAC
3DCart Pole (OpenAI Gym)Score178.3MAC
3DLunar Lander (OpenAI Gym)Score163.5MAC
3D Face ModellingCart Pole (OpenAI Gym)Score178.3MAC
3D Face ModellingLunar Lander (OpenAI Gym)Score163.5MAC

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17