TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/N-step Returns

N-step Returns

Reinforcement LearningIntroduced 200029 papers

Description

nnn-step Returns are used for value function estimation in reinforcement learning. Specifically, for nnn steps we can write the complete return as:

R_t(n)=r_t+1+γr_t+2+⋯+γn−1_t+n+γnV_t(s_t+n)R\_{t}^{(n)} = r\_{t+1} + \gamma{r}\_{t+2} + \cdots + \gamma^{n-1}\_{t+n} + \gamma^{n}V\_{t}\left(s\_{t+n}\right)R_t(n)=r_t+1+γr_t+2+⋯+γn−1_t+n+γnV_t(s_t+n)

We can then write an nnn-step backup, in the style of TD learning, as:

ΔV_r(s_t)=α[R_t(n)−V_t(s_t)]\Delta{V}\_{r}\left(s\_{t}\right) = \alpha\left[R\_{t}^{(n)} - V\_{t}\left(s\_{t}\right)\right]ΔV_r(s_t)=α[R_t(n)−V_t(s_t)]

Multi-step returns often lead to faster learning with suitably tuned nnn.

Image Credit: Sutton and Barto, Reinforcement Learning

Papers Using This Method

Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork2025-06-12Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns2025-03-05Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC2024-11-06Learning in complex action spaces without policy gradients2024-10-08Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning2023-11-07SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models2023-10-19A Long $N$-step Surrogate Stage Reward for Deep Reinforcement Learning2023-09-21Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks2022-09-16DNA: Proximal Policy Optimization with a Dual Network Architecture2022-06-20Gamma and Vega Hedging Using Deep Distributional Reinforcement Learning2022-05-10Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach2022-04-21Deep Reinforcement Learning at the Edge of the Statistical Precipice2021-08-30A coevolutionary approach to deep multi-agent reinforcement learning2021-04-12Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates2021-01-01Adaptive N-step Bootstrapping with Off-policy Data2021-01-01Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking2020-11-15A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward2020-09-24Munchausen Reinforcement Learning2020-07-28Revisiting Fundamentals of Experience Replay2020-07-13SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning2020-07-09