Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Soft Actor Critic (Autotuned Temperature is a modification of the SAC reinforcement learning algorithm. SAC can suffer from brittleness to the temperature hyperparameter. Unlike in conventional reinforcement learning, where the optimal policy is independent of scaling of the reward function, in maximum entropy reinforcement learning the scaling factor has to be compensated by the choice a of suitable temperature, and a sub-optimal temperature can drastically degrade performance. To resolve this issue, SAC with Autotuned Temperature has an automatic gradient-based temperature tuning method that adjusts the expected entropy over the visited states to match a target value.

Soft Actor-Critic (Autotuned Temperature)

Description

Papers Using This Method