GradientDICE

Reinforcement LearningIntroduced 20001 papers

Description

GradientDICE is a density ratio learning method for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. It optimizes a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.

Papers Using This Method

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values2020-01-29