TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Q-Learning

Q-Learning

Reinforcement LearningIntroduced 19841734 papers

Description

Q-Learning is an off-policy temporal difference control algorithm:

Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1+γmax⁡_aQ(S_t+1,a)−Q(S_t,A_t)]Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma\max\_{a}Q\left(S\_{t+1}, a\right) - Q\left(S\_{t}, A\_{t}\right)\right] Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1​+γmax_aQ(S_t+1,a)−Q(S_t,A_t)]

The learned action-value function QQQ directly approximates q_∗q\_{*}q_∗, the optimal action-value function, independent of the policy being followed.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study2025-07-082048: Reinforcement Learning in a Delayed Reward Environment2025-07-07VRAIL: Vectorized Reward-based Attribution for Interpretable Learning2025-06-19Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access2025-06-18GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC2025-06-17ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture2025-06-16Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning2025-06-16"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)2025-06-11Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning2025-06-08Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning2025-06-04Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons2025-06-03Reinforcement Learning for Hanabi2025-05-31Getting More from Less: Transfer Learning Improves Sleep Stage Decoding Accuracy in Peripheral Wearable Devices2025-05-31On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment2025-05-29Combining Deep Architectures for Information Gain estimation and Reinforcement Learning for multiagent field exploration2025-05-29BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL2025-05-28A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging2025-05-27The Cell Must Go On: Agar.io for Continual Reinforcement Learning2025-05-23Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies2025-05-22Reinforcement Learning for Stock Transactions2025-05-22