Demon Adam is a stochastic optimizer where the Demon momentum rule is applied to the Adam optimizer.
β_t=β_init⋅(1−tT)(1−β_init)+β_init(1−tT)\beta\_{t} = \beta\_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta\_{init}\right) + \beta\_{init}\left(1-\frac{t}{T}\right)}β_t=β_init⋅(1−β_init)+β_init(1−Tt)(1−Tt)
m_t,i=g_t,i+β_tm_t−1,im\_{t, i} = g\_{t, i} + \beta\_{t}m\_{t-1, i}m_t,i=g_t,i+β_tm_t−1,i
v_t+1=β_2v_t+(1−β_2)g2_tv\_{t+1} = \beta\_{2}v\_{t} + \left(1-\beta\_{2}\right)g^{2}\_{t}v_t+1=β_2v_t+(1−β_2)g2_t
θt=θt−1−ηm^_tv^_t+ϵ\theta_{t} = \theta_{t-1} - \eta\frac{\hat{m}\_{t}}{\sqrt{\hat{v}\_{t}} + \epsilon} θt=θt−1−ηv^_t+ϵm^_t