Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:

$\delta\_{t} = R\_{t+1} + \gamma\hat{q}\left(S\_{t+1}, A\_{t+1}, \mathbb{w}\_{t}\right) - \hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t}\right)$

and the action-value form of the eligibility trace:

$\mathbb{z}\_{-1} = \mathbb{0}$

$\mathbb{z}\_{t} = \gamma\lambda\mathbb{z}\_{t-1} + \nabla\hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t} \right), 0 \leq t \leq T$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Description

Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:

$\delta\_{t} = R\_{t+1} + \gamma\hat{q}\left(S\_{t+1}, A\_{t+1}, \mathbb{w}\_{t}\right) - \hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t}\right)$

and the action-value form of the eligibility trace:

$\mathbb{z}\_{-1} = \mathbb{0}$

$\mathbb{z}\_{t} = \gamma\lambda\mathbb{z}\_{t-1} + \nabla\hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t} \right), 0 \leq t \leq T$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Sarsa Lambda

Description

Sarsa Lambda

Description