True Online TD Lambda

Reinforcement LearningIntroduced 2000

Description

True Online $TD\left(\lambda\right)$ seeks to approximate the ideal online $\lambda$ -return algorithm. It seeks to invert this ideal forward-view algorithm to produce an efficient backward-view algorithm using eligibility traces. It uses dutch traces rather than accumulating traces.

Source: Sutton and Seijen