TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Double Q-learning

Double Q-learning

Reinforcement LearningIntroduced 2000112 papers
Source Paper

Description

Double Q-learning is an off-policy reinforcement learning algorithm that utilises double estimation to counteract overestimation problems with traditional Q-learning.

The max operator in standard Q-learning and DQN uses the same values both to select and to evaluate an action. This makes it more likely to select overestimated values, resulting in overoptimistic value estimates. To prevent this, we can decouple the selection from the evaluation, which is the idea behind Double Q-learning:

YQ_t=R_t+1+γQ(S_t+1,arg⁡max⁡_aQ(S_t+1,a;θ_t);θ_t)Y^{Q}\_{t} = R\_{t+1} + \gamma{Q}\left(S\_{t+1}, \arg\max\_{a}Q\left(S\_{t+1}, a; \mathbb{\theta}\_{t}\right);\mathbb{\theta}\_{t}\right)YQ_t=R_t+1+γQ(S_t+1,argmax_aQ(S_t+1,a;θ_t);θ_t)

The Double Q-learning error can then be written as:

YDoubleQ_t=R_t+1+γQ(S_t+1,arg⁡max⁡_aQ(S_t+1,a;θ_t);θ′_t)Y^{DoubleQ}\_{t} = R\_{t+1} + \gamma{Q}\left(S\_{t+1}, \arg\max\_{a}Q\left(S\_{t+1}, a; \mathbb{\theta}\_{t}\right);\mathbb{\theta}^{'}\_{t}\right)YDoubleQ_t=R_t+1+γQ(S_t+1,argmax_aQ(S_t+1,a;θ_t);θ′_t)

Here the selection of the action in the arg⁡max⁡\arg\maxargmax is still due to the online weights θ_t\theta\_{t}θ_t. But we use a second set of weights θ′_t\mathbb{\theta}^{'}\_{t}θ′_t to fairly evaluate the value of this policy.

Source: Deep Reinforcement Learning with Double Q-learning

Papers Using This Method

Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access2025-06-18Dynamic Operating System Scheduling Using Double DQN: A Reinforcement Learning Approach to Task Optimization2025-03-31Distribution-Free Uncertainty Quantification in Mechanical Ventilation Treatment: A Conformal Deep Q-Learning Framework2024-12-17Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC2024-11-06Bootstrapping Expectiles in Reinforcement Learning2024-06-06A New View on Planning in Online Reinforcement Learning2024-06-03Active search and coverage using point-cloud reinforcement learning2023-12-18Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization2023-12-10Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control2023-11-30Advancing Algorithmic Trading: A Multi-Technique Enhancement of Deep Q-Network Models2023-11-09Deep Reinforcement Learning for the Heat Transfer Control of Pulsating Impinging Jets2023-09-25Adaptive Multi-Agent Deep Reinforcement Learning for Timely Healthcare Interventions2023-09-20Deep Reinforcement Learning for Artificial Upwelling Energy Management2023-08-20Interpretable and Secure Trajectory Optimization for UAV-Assisted Communication2023-07-05Optimizing Credit Limit Adjustments Under Adversarial Goals Using Reinforcement Learning2023-06-27Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm2023-06-17RSRM: Reinforcement Symbolic Regression Machine2023-05-24Extracting Diagnosis Pathways from Electronic Health Records Using Deep Reinforcement Learning2023-05-10Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity2023-05-07Smoothed Q-learning2023-03-15