TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Expected Sarsa

Expected Sarsa

Reinforcement LearningIntroduced 20009 papers

Description

Expected Sarsa is like Q-learning but instead of taking the maximum over next state-action pairs, we use the expected value, taking into account how likely each action is under the current policy.

Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1+γ∑_aπ(a∣S_t+1)Q(S_t+1,a)−Q(S_t,A_t)]Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma\sum\_{a}\pi\left(a\mid{S\_{t+1}}\right)Q\left(S\_{t+1}, a\right) - Q\left(S\_{t}, A\_{t}\right)\right] Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1​+γ∑_aπ(a∣S_t+1)Q(S_t+1,a)−Q(S_t,A_t)]

Except for this change to the update rule, the algorithm otherwise follows the scheme of Q-learning. It is more computationally expensive than Sarsa but it eliminates the variance due to the random selection of A_t+1A\_{t+1}A_t+1.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

Reinforcement Learning for Hanabi2025-05-31Convergent NMPC-based Reinforcement Learning Using Deep Expected Sarsa and Nonlinear Temporal Difference Learning2025-02-07Solving Royal Game of Ur Using Reinforcement Learning2022-08-23On the Convergence of SARSA with Linear Function Approximation2022-02-14A study of first-passage time minimization via Q-learning in heated gridworlds2021-10-05Chrome Dino Run using Reinforcement Learning2020-08-15Model-free Reinforcement Learning for Stochastic Stackelberg Security Games2020-05-24The Concept of Criticality in Reinforcement Learning2018-10-16Multi-step Reinforcement Learning: A Unifying Algorithm2017-03-03