Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Nesterov Accelerated Gradient

Nesterov Accelerated Gradient

GeneralIntroduced 198334 papers

Description

Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:

$v\_{t} = \gamma{v}\_{t-1} - \eta\nabla\_{\theta}J\left(\theta_{t-1}+\gamma{v\_{t-1}}\right)$ $\theta\_{t} = \theta\_{t-1} + v\_{t}$ $\gamma, \eta \in \mathbb{R}^+$

Like SGD with momentum $\gamma$ is usually set to $0.9$ . $\eta$ and $\gamma$ are usually less than $1$ .

The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.

Image Source: Geoff Hinton lecture notes

Papers Using This Method

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters2025-06-13 Nesterov Method for Asynchronous Pipeline Parallel Optimization2025-05-02 Advancing RVFL networks: Robust classification with the HawkEye loss function2024-10-01 An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness2024-09-28 Optimizing Time Series Forecasting: A Comparative Study of Adam and Nesterov Accelerated Gradient on LSTM and GRU networks Using Stock Market data2024-09-28 DenoMamba: A fused state-space model for low-dose CT denoising2024-09-19 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement2024-06-12 Momentum-SAM: Sharpness Aware Minimization without Computational Overhead2024-01-22 Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima2023-07-13 Riemannian accelerated gradient methods via extrapolation2022-08-13 Last-iterate convergence analysis of stochastic momentum methods for neural networks2022-05-30 Automated Parking Space Detection Using Convolutional Neural Networks2021-06-14 A Discrete Variational Derivation of Accelerated Methods in Optimization2021-06-04 A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes2021-02-12 Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance2020-12-07 A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks2020-10-25 Accelerated Gradient Methods for Sparse Statistical Learning with Nonconvex Penalties2020-09-22 Federated Learning with Nesterov Accelerated Gradient2020-09-18 GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet2020-03-25 Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent2020-02-24