Description
Soft Actor Critic (Autotuned Temperature is a modification of the SAC reinforcement learning algorithm. SAC can suffer from brittleness to the temperature hyperparameter. Unlike in conventional reinforcement learning, where the optimal policy is independent of scaling of the reward function, in maximum entropy reinforcement learning the scaling factor has to be compensated by the choice a of suitable temperature, and a sub-optimal temperature can drastically degrade performance. To resolve this issue, SAC with Autotuned Temperature has an automatic gradient-based temperature tuning method that adjusts the expected entropy over the visited states to match a target value.
Papers Using This Method
Leveraging Demonstrations with Latent Space Priors2022-10-26Soft Actor-Critic Deep Reinforcement Learning for Fault Tolerant Flight Control2022-02-16Self-Supervised Policy Adaptation during Deployment2020-07-08PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network2020-03-16Discrete and Continuous Action Representation for Practical RL in Video Games2019-12-23Soft Actor-Critic Algorithms and Applications2018-12-13