Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/AMSGrad

AMSGrad

GeneralIntroduced 200049 papers

Description

AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients $v\_{t}$ rather than the exponential average to update the parameters:

$m\_{t} = \beta\_{1}m\_{t-1} + \left(1-\beta\_{1}\right)g\_{t}$

$v\_{t} = \beta\_{2}v\_{t-1} + \left(1-\beta\_{2}\right)g\_{t}^{2}$

$\hat{v}\_{t} = \max\left(\hat{v}\_{t-1}, v\_{t}\right)$

$\theta\_{t+1} = \theta\_{t} - \frac{\eta}{\sqrt{\hat{v}_{t}} + \epsilon}m\_{t}$

Papers Using This Method

Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation2024-10-14 MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence2024-05-24 MADA: Meta-Adaptive Optimizers through hyper-gradient Descent2024-01-17 FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data2023-09-18 Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods2023-05-21 UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization2023-05-09 $\mathcal{C}^k$-continuous Spline Approximation with TensorFlow Gradient Descent Optimizers2023-03-22 AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks2023-03-01 Optimization Methods in Deep Learning: A Comprehensive Overview2023-02-19 Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient2022-10-24 Communication-Efficient Adam-Type Algorithms for Distributed Data Mining2022-10-14 Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization2022-06-01 On Distributed Adaptive Optimization with Gradient Compression2022-05-11 AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization2022-01-18 Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates2022-01-05 A Novel Convergence Analysis for Algorithms of the Adam Family2021-12-07 Convergence of adaptive algorithms for constrained weakly convex optimization2021-12-01 Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization2021-11-01 SGD Can Converge to Local Maxima2021-09-29 On the Convergence of Decentralized Adaptive Gradient Methods2021-09-07