Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Retrace

Retrace

Reinforcement LearningIntroduced 200031 papers

Description

Retrace is an off-policy Q-value estimation algorithm which has guaranteed convergence for a target and behaviour policy $\left(\pi, \beta\right)$ . With off-policy rollout for TD learning, we must use importance sampling for the update:

$\Delta{Q}^{\text{imp}}\left(S\_{t}, A\_{t}\right) = \gamma^{t}\prod\_{1\leq{\tau}\leq{t}}\frac{\pi\left(A\_{\tau}\mid{S\_{\tau}}\right)}{\beta\left(A\_{\tau}\mid{S\_{\tau}}\right)}\delta\_{t}$

This product term can lead to high variance, so Retrace modifies $\Delta{Q}$ to have importance weights truncated by no more than a constant $c$ :

$\Delta{Q}^{\text{imp}}\left(S\_{t}, A\_{t}\right) = \gamma^{t}\prod\_{1\leq{\tau}\leq{t}}\min\left(c, \frac{\pi\left(A\_{\tau}\mid{S\_{\tau}}\right)}{\beta\left(A\_{\tau}\mid{S\_{\tau}}\right)}\right)\delta\_{t}$

Papers Using This Method

UI-Evol: Automatic Knowledge Evolving for Computer Use Agents2025-05-28 Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research2025-02-25 Dynamics of Resource Allocation in O-RANs: An In-depth Exploration of On-Policy and Off-Policy Deep Reinforcement Learning for Real-Time Applications2024-11-17 IDRetracor: Towards Visual Forensics Against Malicious Face Swapping2024-08-13 Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues2024-04-12 Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling2024-02-08 Network-thinking to optimize surveillance and control of crop parasites. A review2023-10-11 Distributional Estimation of Data Uncertainty for Surveillance Face Anti-spoofing2023-09-18 PDVN: A Patch-based Dual-view Network for Face Liveness Detection using Light Field Focal Stack2023-01-17 AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning2022-11-28 Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments2022-07-04 Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading2022-06-13 Bias-inducing geometries: an exactly solvable data model with fairness implications2022-05-31 Deep Learning with Logical Constraints2022-05-01 Marginalized Operators for Off-policy Reinforcement Learning2022-03-30 Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?2022-03-30 Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions2021-12-23 Learning Reward Machines: A Study in Partially Observable Reinforcement Learning2021-12-17 Human Languages with Greater Information Density Increase Communication Speed, but Decrease Conversation Breadth2021-12-15 Dynamics of the market states in the space of correlation matrices with applications to financial markets2021-07-12