Deterministic Policy Gradient, or DPG, is a policy gradient method for reinforcement learning. Instead of the policy function π(.∣s) being modeled as a probability distribution, DPG considers and calculates gradients for a deterministic policy a=μ_theta(s).