Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/V-trace

V-trace

Reinforcement LearningIntroduced 200034 papers

Description

V-trace is an off-policy actor-critic reinforcement learning algorithm that helps tackle the lag between when actions are generated by the actors and when the learner estimates the gradient. Consider a trajectory $\left(x\_{t}, a\_{t}, r\_{t}\right)^{t=s+n}\_{t=s}$ generated by the actor following some policy $\mu$ . We can define the $n$ -steps V-trace target for $V\left(x\_{s}\right)$ , our value approximation at state $x\_{s}$ as:

$v\_{s} = V\left(x\_{s}\right) + \sum^{s+n-1}\_{t=s}\gamma^{t-s}\left(\prod^{t-1}\_{i=s}c\_{i}\right)\delta\_{t}V$

Where $\delta\_{t}V = \rho\_{t}\left(r\_{t} + \gamma{V}\left(x\_{t+1}\right) - V\left(x\_{t}\right)\right)$ is a temporal difference algorithm for $V$ , and $\rho\_{t} = \text{min}\left(\bar{\rho}, \frac{\pi\left(a\_{t}\mid{x\_{t}}\right)}{\mu\left(a\_{t}\mid{x\_{t}}\right)}\right)$ and $c\_{i} = \text{min}\left(\bar{c}, \frac{\pi\left(a\_{t}\mid{x\_{t}}\right)}{\mu\left(a\_{t}\mid{x\_{t}}\right)}\right)$ are truncated importance sampling weights. We assume that the truncation levels are such that $\bar{\rho} \geq \bar{c}$ .

Papers Using This Method

World Model Agents with Change-Based Intrinsic Motivation2025-03-26 Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation2024-03-07 Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach2023-12-19 Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform2023-09-29 A Robust and Opponent-Aware League Training Method for StarCraft II2023-09-21 AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning2023-08-07 Exploring the Promise and Limits of Real-Time Recurrent Learning2023-05-30 DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm2023-05-29 Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks2023-05-18 Lifelong Reinforcement Learning with Modulating Masks2022-12-21 AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning2022-11-28 On Efficient Reinforcement Learning for Full-length Game of StarCraft II2022-09-23 EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine2022-06-21 Semantic Exploration from Language Abstractions and Pretrained Representations2022-04-08 Off-Policy Correction For Multi-Agent Reinforcement Learning2021-11-22 AI in Human-computer Gaming: Techniques, Challenges and Opportunities2021-11-15 A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments2021-10-24 MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization2021-09-02 Rethinking of AlphaStar2021-08-07 An Introduction of mini-AlphaStar2021-04-14