Mean Actor Critic

Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

2017-09-01Reinforcement Learning Atari Games reinforcement-learning

Abstract

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate relative to traditional actor-critic methods. We show empirical results on two control domains and on six Atari games, where MAC is competitive with state-of-the-art policy search algorithms.

Results

Task	Dataset	Metric	Value	Model
Continuous Control	Cart Pole (OpenAI Gym)	Score	178.3	MAC
Continuous Control	Lunar Lander (OpenAI Gym)	Score	163.5	MAC
Atari Games	Atari 2600 Pong	Score	10.6	MAC
Atari Games	Atari 2600 Breakout	Score	372.7	MAC
Atari Games	Atari 2600 Space Invaders	Score	1173.1	MAC
Atari Games	Atari 2600 Beam Rider	Score	6072	MAC
Atari Games	Atari 2600 Seaquest	Score	1703.4	MAC
Atari Games	Atari 2600 Q*Bert	Score	243.4	MAC
Video Games	Atari 2600 Pong	Score	10.6	MAC
Video Games	Atari 2600 Breakout	Score	372.7	MAC
Video Games	Atari 2600 Space Invaders	Score	1173.1	MAC
Video Games	Atari 2600 Beam Rider	Score	6072	MAC
Video Games	Atari 2600 Seaquest	Score	1703.4	MAC
Video Games	Atari 2600 Q*Bert	Score	243.4	MAC
3D	Cart Pole (OpenAI Gym)	Score	178.3	MAC
3D	Lunar Lander (OpenAI Gym)	Score	163.5	MAC
3D Face Modelling	Cart Pole (OpenAI Gym)	Score	178.3	MAC
3D Face Modelling	Lunar Lander (OpenAI Gym)	Score	163.5	MAC

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17