TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Demon ADAM

Demon ADAM

GeneralIntroduced 20001 papers
Source Paper

Description

Demon Adam is a stochastic optimizer where the Demon momentum rule is applied to the Adam optimizer.

β_t=β_init⋅(1−tT)(1−β_init)+β_init(1−tT)\beta\_{t} = \beta\_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta\_{init}\right) + \beta\_{init}\left(1-\frac{t}{T}\right)}β_t=β_init⋅(1−β_init)+β_init(1−Tt​)(1−Tt​)​

m_t,i=g_t,i+β_tm_t−1,im\_{t, i} = g\_{t, i} + \beta\_{t}m\_{t-1, i}m_t,i=g_t,i+β_tm_t−1,i

v_t+1=β_2v_t+(1−β_2)g2_tv\_{t+1} = \beta\_{2}v\_{t} + \left(1-\beta\_{2}\right)g^{2}\_{t}v_t+1=β_2v_t+(1−β_2)g2_t

θt=θt−1−ηm^_tv^_t+ϵ\theta_{t} = \theta_{t-1} - \eta\frac{\hat{m}\_{t}}{\sqrt{\hat{v}\_{t}} + \epsilon} θt​=θt−1​−ηv^_t​+ϵm^_t​

Papers Using This Method

Demon: Improved Neural Network Training with Momentum Decay2019-10-11