TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Sarsa

Sarsa

Reinforcement LearningIntroduced 199456 papers

Description

Sarsa is an on-policy TD control algorithm:

Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1+γQ(S_t+1,A_t+1)−Q(S_t,A_t)]Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma{Q}\left(S\_{t+1}, A\_{t+1}\right) - Q\left(S\_{t}, A\_{t}\right)\right] Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1​+γQ(S_t+1,A_t+1)−Q(S_t,A_t)]

This update is done after every transition from a nonterminal state S_tS\_{t}S_t. if S_t+1S\_{t+1}S_t+1 is terminal, then Q(S_t+1,A_t+1)Q\left(S\_{t+1}, A\_{t+1}\right)Q(S_t+1,A_t+1) is defined as zero.

To design an on-policy control algorithm using Sarsa, we estimate q_πq\_{\pi}q_π for a behaviour policy π\piπ and then change π\piπ towards greediness with respect to q_πq\_{\pi}q_π.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers Using This Method

A Unified Anti-Jamming Design in Complex Environments Based on Cross-Modal Fusion and Intelligent Decision-Making2025-06-09Reinforcement Learning for Hanabi2025-05-31Convergent NMPC-based Reinforcement Learning Using Deep Expected Sarsa and Nonlinear Temporal Difference Learning2025-02-07Segmenting Action-Value Functions Over Time-Scales in SARSA via TD($Δ$)2024-11-22A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot2024-09-23Reinforcement Learning for Rate Maximization in IRS-aided OWC Networks2024-09-07Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach2024-08-23The State-Action-Reward-State-Action Algorithm in Spatial Prisoner's Dilemma Game2024-06-25SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems2024-05-07Research on Robot Path Planning Based on Reinforcement Learning2024-04-22State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards2024-03-18Enhancing Classification Performance via Reinforcement Learning for Feature Selection2024-03-09An Index Policy Based on Sarsa and Q-learning for Heterogeneous Smart Target Tracking2024-02-19Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents2023-10-02Career Path Recommendations for Long-term Income Maximization: A Reinforcement Learning Approach2023-09-11Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment2023-07-20PCG-based Static Underground Garage Scenario Generation2023-07-08Convergence of SARSA with linear function approximation: The random horizon case2023-06-07On Modeling Network Slicing Communication Resources with SARSA Optimization2023-01-11Analysis of Reinforcement Learning Schemes for Trajectory Optimization of an Aerial Radio Unit2022-11-18