TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/TD Lambda

TD Lambda

Reinforcement LearningIntroduced 200014 papers

Description

TD_INLINE_MATH_1 is a generalisation of TD_INLINE_MATH_2 reinforcement learning algorithms, but it employs an eligibility trace λ\lambdaλ and λ\lambdaλ-weighted returns. The eligibility trace vector is initialized to zero at the beginning of the episode, and it is incremented on each time step by the value gradient, and then fades away by γλ\gamma\lambdaγλ:

z_−1=0\textbf{z}\_{-1} = \mathbf{0}z_−1=0 z_t=γλz_t−1+∇v^(S_t,w_t),0≤t≤T \textbf{z}\_{t} = \gamma\lambda\textbf{z}\_{t-1} + \nabla\hat{v}\left(S\_{t}, \mathbf{w}\_{t}\right), 0 \leq t \leq Tz_t=γλz_t−1+∇v^(S_t,w_t),0≤t≤T

The eligibility trace keeps track of which components of the weight vector contribute to recent state valuations. Here ∇v^(S_t,w_t)\nabla\hat{v}\left(S\_{t}, \mathbf{w}\_{t}\right)∇v^(S_t,w_t) is the feature vector.

The TD error for state-value prediction is:

\delta\_{t} = R\_{t+1} + \gamma\hat{v}\left\(S\_{t+1}, \mathbf{w}\_{t}\right) - \hat{v}\left(S\_{t}, \mathbf{w}\_{t}\right)

In TD_INLINE_MATH_1, the weight vector is updated on each step proportional to the scalar TD error and the vector eligibility trace:

w_t+1=w_t+αδz_t\mathbf{w}\_{t+1} = \mathbf{w}\_{t} + \alpha\delta\mathbf{z}\_{t} w_t+1=w_t+αδz_t

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

On-line Policy Improvement using Monte-Carlo Search2025-01-09Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming2024-06-02Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach2023-12-19A Robust and Opponent-Aware League Training Method for StarCraft II2023-09-21AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning2023-08-07On Efficient Reinforcement Learning for Full-length Game of StarCraft II2022-09-23AI in Human-computer Gaming: Techniques, Challenges and Opportunities2021-11-15Search in Imperfect Information Games2021-11-10Rethinking of AlphaStar2021-08-07An Introduction of mini-AlphaStar2021-04-14Deep Reinforcement Learning with Function Properties in Mean Reversion Strategies2021-01-09TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game2020-11-27AlphaStar: An Evolutionary Computation Perspective2019-02-05A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks2016-06-20