TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Demon

Demon

GeneralIntroduced 200016 papers
Source Paper

Description

Decaying Momentum, or Demon, is a stochastic optimizer motivated by decaying the total contribution of a gradient to all future updates. By decaying the momentum parameter, the total contribution of a gradient to all future updates is decayed. A particular gradient term g_tg\_{t}g_t contributes a total of η∑_iβi\eta\sum\_{i}\beta^{i}η∑_iβi of its "energy" to all future gradient updates, and this results in the geometric sum, ∑∞_i=1βi=β∑∞_i=0βi=β(1−β)\sum^{\infty}\_{i=1}\beta^{i} = \beta\sum^{\infty}\_{i=0}\beta^{i} = \frac{\beta}{\left(1-\beta\right)}∑∞_i=1βi=β∑∞_i=0βi=(1−β)β​. Decaying this sum results in the Demon algorithm. Letting β_init\beta\_{init}β_init be the initial β\betaβ; then at the current step ttt with total TTT steps, the decay routine is given by solving the below for β_t\beta\_{t}β_t:

β_t(1−β_t)=(1−t/T)β_init/(1−β_init) \frac{\beta\_{t}}{\left(1-\beta\_{t}\right)} = \left(1-t/T\right)\beta\_{init}/\left(1-\beta\_{init}\right)(1−β_t)β_t​=(1−t/T)β_init/(1−β_init)

Where (1−t/T)\left(1-t/T\right)(1−t/T) refers to the proportion of iterations remaining. Note that Demon typically requires no hyperparameter tuning as it is usually decayed to 000 or a small negative value at time TTT. Improved performance is observed by delaying the decaying. Demon can be applied to any gradient descent algorithm with a momentum parameter.

Papers Using This Method

Representation and Interpretation in Artificial and Natural Computing2025-02-14DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder2024-11-05Training-free Diffusion Model Alignment with Sampling Demons2024-10-08Neural Entropy2024-09-05Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph2024-08-28A Decentralized and Self-Adaptive Approach for Monitoring Volatile Edge Environments2024-05-13Reflective Linguistic Programming (RLP): A Stepping Stone in Socially-Aware AGI (SocialAGI)2023-05-22How to train your demon to do fast information erasure without heat production2023-05-17Thermodynamic AI and the fluctuation frontier2023-02-09Differentiable Neural Computers with Memory Demon2022-11-05Static Knowledge vs. Dynamic Argumentation: A Dual Theory Based on Kripke Semantics2022-09-27Boosting Adversarial Transferability of MLP-Mixer2022-04-26Learning Relational Rules from Rewards2022-03-25Nonequilibrium thermodynamics of self-supervised learning2021-06-16Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty2021-04-16Demon: Improved Neural Network Training with Momentum Decay2019-10-11