Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

A3C

Reinforcement LearningIntroduced 200057 papers

Description

A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy $\pi\left(a\_{t}\mid{s}\_{t}; \theta\right)$ and an estimate of the value function $V\left(s\_{t}; \theta\_{v}\right)$ . It operates in the forward view and uses a mix of $n$ -step returns to update both the policy and the value-function. The policy and the value function are updated after every $t\_{\text{max}}$ actions or when a terminal state is reached. The update performed by the algorithm can be seen as $\nabla\_{\theta{'}}\log\pi\left(a\_{t}\mid{s\_{t}}; \theta{'}\right)A\left(s\_{t}, a\_{t}; \theta, \theta\_{v}\right)$ where $A\left(s\_{t}, a\_{t}; \theta, \theta\_{v}\right)$ is an estimate of the advantage function given by:

$\sum^{k-1}\_{i=0}\gamma^{i}r\_{t+i} + \gamma^{k}V\left(s\_{t+k}; \theta\_{v}\right) - V\left(s\_{t}; \theta\_{v}\right)$

where $k$ can vary from state to state and is upper-bounded by $t\_{max}$ .

The critics in A3C learn the value function while multiple actors are trained in parallel and get synced with global parameters every so often. The gradients are accumulated as part of training for stability - this is like parallelized stochastic gradient descent.

Note that while the parameters $\theta$ of the policy and $\theta\_{v}$ of the value function are shown as being separate for generality, we always share some of the parameters in practice. We typically use a convolutional neural network that has one softmax output for the policy $\pi\left(a\_{t}\mid{s}\_{t}; \theta\right)$ and one linear output for the value function $V\left(s\_{t}; \theta\_{v}\right)$ , with all non-output layers shared.

Papers Using This Method

Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study2025-07-08 Energy Efficient RSMA-Based LEO Satellite Communications Assisted by UAV-Mounted BD-Active RIS: A DRL Approach2025-05-07 Intelligent Task Scheduling for Microservices via A3C-Based Reinforcement Learning2025-05-01 Demand-Aware Beam Hopping and Power Allocation for Load Balancing in Digital Twin empowered LEO Satellite Networks2024-10-29 Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts2024-10-22 Physical Informed-Inspired Deep Reinforcement Learning Based Bi-Level Programming for Microgrid Scheduling2024-10-15 Criticality and Safety Margins for Reinforcement Learning2024-09-26 Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN2024-07-22 A Deep Reinforcement Learning Approach for Trading Optimization in the Forex Market with Multi-Agent Asynchronous Distribution2024-05-30 Sum Throughput Maximization in Multi-BD Symbiotic Radio NOMA Network Assisted by Active-STAR-RIS2024-01-16 Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor2023-09-22 Safety Margins for Reinforcement Learning2023-07-25 ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages2023-06-02 Double A3C: Deep Reinforcement Learning on OpenAI Gym Games2023-03-04 Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach2022-12-06 Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image2022-10-12 Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply Chains2022-04-20 RL-CoSeg : A Novel Image Co-Segmentation Algorithm with Deep Reinforcement Learning2022-04-12 Learning Reward Machines: A Study in Partially Observable Reinforcement Learning2021-12-17 Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning2021-03-06