TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/A3C

A3C

Reinforcement LearningIntroduced 200057 papers
Source Paper

Description

A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy π(a_t∣s_t;θ)\pi\left(a\_{t}\mid{s}\_{t}; \theta\right)π(a_t∣s_t;θ) and an estimate of the value function V(s_t;θ_v)V\left(s\_{t}; \theta\_{v}\right)V(s_t;θ_v). It operates in the forward view and uses a mix of nnn-step returns to update both the policy and the value-function. The policy and the value function are updated after every t_maxt\_{\text{max}}t_max actions or when a terminal state is reached. The update performed by the algorithm can be seen as ∇_θ′log⁡π(a_t∣s_t;θ′)A(s_t,a_t;θ,θ_v)\nabla\_{\theta{'}}\log\pi\left(a\_{t}\mid{s\_{t}}; \theta{'}\right)A\left(s\_{t}, a\_{t}; \theta, \theta\_{v}\right)∇_θ′logπ(a_t∣s_t;θ′)A(s_t,a_t;θ,θ_v) where A(s_t,a_t;θ,θ_v)A\left(s\_{t}, a\_{t}; \theta, \theta\_{v}\right)A(s_t,a_t;θ,θ_v) is an estimate of the advantage function given by:

∑k−1_i=0γir_t+i+γkV(s_t+k;θ_v)−V(s_t;θ_v)\sum^{k-1}\_{i=0}\gamma^{i}r\_{t+i} + \gamma^{k}V\left(s\_{t+k}; \theta\_{v}\right) - V\left(s\_{t}; \theta\_{v}\right)∑k−1_i=0γir_t+i+γkV(s_t+k;θ_v)−V(s_t;θ_v)

where kkk can vary from state to state and is upper-bounded by t_maxt\_{max}t_max.

The critics in A3C learn the value function while multiple actors are trained in parallel and get synced with global parameters every so often. The gradients are accumulated as part of training for stability - this is like parallelized stochastic gradient descent.

Note that while the parameters θ\thetaθ of the policy and θ_v\theta\_{v}θ_v of the value function are shown as being separate for generality, we always share some of the parameters in practice. We typically use a convolutional neural network that has one softmax output for the policy π(a_t∣s_t;θ)\pi\left(a\_{t}\mid{s}\_{t}; \theta\right)π(a_t∣s_t;θ) and one linear output for the value function V(s_t;θ_v)V\left(s\_{t}; \theta\_{v}\right)V(s_t;θ_v), with all non-output layers shared.

Papers Using This Method

Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study2025-07-08Energy Efficient RSMA-Based LEO Satellite Communications Assisted by UAV-Mounted BD-Active RIS: A DRL Approach2025-05-07Intelligent Task Scheduling for Microservices via A3C-Based Reinforcement Learning2025-05-01Demand-Aware Beam Hopping and Power Allocation for Load Balancing in Digital Twin empowered LEO Satellite Networks2024-10-29Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts2024-10-22Physical Informed-Inspired Deep Reinforcement Learning Based Bi-Level Programming for Microgrid Scheduling2024-10-15Criticality and Safety Margins for Reinforcement Learning2024-09-26Evaluation of Reinforcement Learning for Autonomous Penetration Testing using A3C, Q-learning and DQN2024-07-22A Deep Reinforcement Learning Approach for Trading Optimization in the Forex Market with Multi-Agent Asynchronous Distribution2024-05-30Sum Throughput Maximization in Multi-BD Symbiotic Radio NOMA Network Assisted by Active-STAR-RIS2024-01-16Learning Actions and Control of Focus of Attention with a Log-Polar-like Sensor2023-09-22Safety Margins for Reinforcement Learning2023-07-25ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages2023-06-02Double A3C: Deep Reinforcement Learning on OpenAI Gym Games2023-03-04Reinforcement Learning for Molecular Dynamics Optimization: A Stochastic Pontryagin Maximum Principle Approach2022-12-06Point Cloud Scene Completion with Joint Color and Semantic Estimation from Single RGB-D Image2022-10-12Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply Chains2022-04-20RL-CoSeg : A Novel Image Co-Segmentation Algorithm with Deep Reinforcement Learning2022-04-12Learning Reward Machines: A Study in Partially Observable Reinforcement Learning2021-12-17Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning2021-03-06