Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Q-Learning

Q-Learning

Reinforcement LearningIntroduced 19841734 papers

Description

Q-Learning is an off-policy temporal difference control algorithm:

$Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma\max\_{a}Q\left(S\_{t+1}, a\right) - Q\left(S\_{t}, A\_{t}\right)\right]$

The learned action-value function $Q$ directly approximates $q\_{*}$ , the optimal action-value function, independent of the policy being followed.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study2025-07-08 2048: Reinforcement Learning in a Delayed Reward Environment2025-07-07 VRAIL: Vectorized Reward-based Attribution for Interpretable Learning2025-06-19 Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access2025-06-18 GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC2025-06-17 ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture2025-06-16 Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning2025-06-16 "What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)2025-06-11 Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning2025-06-08 Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning2025-06-04 Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons2025-06-03 Reinforcement Learning for Hanabi2025-05-31 Getting More from Less: Transfer Learning Improves Sleep Stage Decoding Accuracy in Peripheral Wearable Devices2025-05-31 On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment2025-05-29 Combining Deep Architectures for Information Gain estimation and Reinforcement Learning for multiagent field exploration2025-05-29 BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL2025-05-28 A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging2025-05-27 The Cell Must Go On: Agar.io for Continual Reinforcement Learning2025-05-23 Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies2025-05-22 Reinforcement Learning for Stock Transactions2025-05-22