Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, Herke van Hoof, David Meger

2018-02-26ICML 2018 7Reinforcement Learning Continuous Control OpenAI Gym Q-Learning reinforcement-learning

Abstract

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

Results

Task	Dataset	Metric	Value	Model
OpenAI Gym	Humanoid-v4	Average Return	198.44	TD3
OpenAI Gym	HalfCheetah-v4	Average Return	12026.73	TD3
OpenAI Gym	Ant-v4	Average Return	5942.55	TD3
OpenAI Gym	Walker2d-v4	Average Return	2612.74	TD3
OpenAI Gym	Hopper-v4	Average Return	3319.98	TD3

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17