TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Eligibility Trace

Eligibility Trace

Reinforcement LearningIntroduced 200011 papers

Description

An Eligibility Trace is a memory vector z_t∈Rd\textbf{z}\_{t} \in \mathbb{R}^{d}z_t∈Rd that parallels the long-term weight vector w_t∈Rd\textbf{w}\_{t} \in \mathbb{R}^{d}w_t∈Rd. The idea is that when a component of w_t\textbf{w}\_{t}w_t participates in producing an estimated value, the corresponding component of z_t\textbf{z}\_{t}z_t is bumped up and then begins to fade away. Learning will then occur in that component of w_t\textbf{w}\_{t}w_t if a nonzero TD error occurs before the trade falls back to zero. The trace-decay parameter λ∈[0,1]\lambda \in \left[0, 1\right]λ∈[0,1] determines the rate at which the trace falls.

Intuitively, they tackle the credit assignment problem by capturing both a frequency heuristic - states that are visited more often deserve more credit - and a recency heuristic - states that are visited more recently deserve more credit.

E_0(s)=0E\_{0}\left(s\right) = 0 E_0(s)=0 E_t(s)=γλE_t−1(s)+1(S_t=s)E\_{t}\left(s\right) = \gamma\lambda{E}\_{t-1}\left(s\right) + \textbf{1}\left(S\_{t} = s\right) E_t(s)=γλE_t−1(s)+1(S_t=s)

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

Noise-based reward-modulated learning2025-03-31Predecessor Features2022-06-01META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning2020-06-16Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games2020-05-26Gradient Q$(σ, λ)$: A Unified Algorithm with Function Approximation for Reinforcement Learning2019-09-06Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning2019-06-18META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation2019-04-25Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control2018-05-10A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning2018-02-09A forward model at Purkinje cell synapses facilitates cerebellar anticipatory control2016-12-01Q($λ$) with Off-Policy Corrections2016-02-16