Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

An Eligibility Trace is a memory vector $\textbf{z}\_{t} \in \mathbb{R}^{d}$ that parallels the long-term weight vector $\textbf{w}\_{t} \in \mathbb{R}^{d}$ . The idea is that when a component of $\textbf{w}\_{t}$ participates in producing an estimated value, the corresponding component of $\textbf{z}\_{t}$ is bumped up and then begins to fade away. Learning will then occur in that component of $\textbf{w}\_{t}$ if a nonzero TD error occurs before the trade falls back to zero. The trace-decay parameter $\lambda \in \left[0, 1\right]$ determines the rate at which the trace falls.

Intuitively, they tackle the credit assignment problem by capturing both a frequency heuristic - states that are visited more often deserve more credit - and a recency heuristic - states that are visited more recently deserve more credit.

$E\_{0}\left(s\right) = 0$ $E\_{t}\left(s\right) = \gamma\lambda{E}\_{t-1}\left(s\right) + \textbf{1}\left(S\_{t} = s\right)$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Description

$E\_{0}\left(s\right) = 0$ $E\_{t}\left(s\right) = \gamma\lambda{E}\_{t-1}\left(s\right) + \textbf{1}\left(S\_{t} = s\right)$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Eligibility Trace

Description

Papers Using This Method

Eligibility Trace

Description

Papers Using This Method