TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Sarsa Lambda

Sarsa Lambda

Reinforcement LearningIntroduced 2000

Description

Sarsa_INLINE_MATH_1 extends eligibility-traces to action-value methods. It has the same update rule as for TD_INLINE_MATH_1 but we use the action-value form of the TD erorr:

δ_t=R_t+1+γq^(S_t+1,A_t+1,w_t)−q^(S_t,A_t,w_t)\delta\_{t} = R\_{t+1} + \gamma\hat{q}\left(S\_{t+1}, A\_{t+1}, \mathbb{w}\_{t}\right) - \hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t}\right)δ_t=R_t+1+γq^​(S_t+1,A_t+1,w_t)−q^​(S_t,A_t,w_t)

and the action-value form of the eligibility trace:

z_−1=0\mathbb{z}\_{-1} = \mathbb{0}z_−1=0

z_t=γλz_t−1+∇q^(S_t,A_t,w_t),0≤t≤T \mathbb{z}\_{t} = \gamma\lambda\mathbb{z}\_{t-1} + \nabla\hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t} \right), 0 \leq t \leq Tz_t=γλz_t−1+∇q^​(S_t,A_t,w_t),0≤t≤T

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition