Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/AdaGrad

AdaGrad

GeneralIntroduced 2011192 papers

Description

AdaGrad is a stochastic optimization method that adapts the learning rate to the parameters. It performs smaller updates for parameters associated with frequently occurring features, and larger updates for parameters associated with infrequently occurring features. In its update rule, Adagrad modifies the general learning rate $\eta$ at each time step $t$ for every parameter $\theta\_{i}$ based on the past gradients for $\theta\_{i}$ :

$\theta\_{t+1, i} = \theta\_{t, i} - \frac{\eta}{\sqrt{G\_{t, ii} + \epsilon}}g\_{t, i}$

The benefit of AdaGrad is that it eliminates the need to manually tune the learning rate; most leave it at a default value of $0.01$ . Its main weakness is the accumulation of the squared gradients in the denominator. Since every added term is positive, the accumulated sum keeps growing during training, causing the learning rate to shrink and becoming infinitesimally small.

Image: Alec Radford

Papers Using This Method

Recursive Bound-Constrained AdaGrad with Applications to Multilevel and Domain Decomposition Minimization2025-07-15 LightSAM: Parameter-Agnostic Sharpness-Aware Minimization2025-05-30 Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation2025-05-20 Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness2025-05-07 Structured Preconditioners in Adaptive Optimization: A Unified Analysis2025-03-13 Tractable Representations for Convergent Approximation of Distributional HJB Equations2025-03-07 Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization2025-02-17 Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions2025-01-08 Towards Simple and Provable Parameter-Free Adaptive Gradient Methods2024-12-27 Adaptive Optimization for Enhanced Efficiency in Large-Scale Language Model Training2024-12-06 A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19 Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations2024-11-14 New Insight in Cervical Cancer Diagnosis Using Convolution Neural Network Architecture2024-10-23 Preconditioning for Accelerated Gradient Descent Optimization and Regularization2024-09-30 Stability and convergence analysis of AdaGrad for non-convex optimization via novel stopping time-based techniques2024-09-08 Causal Temporal Representation Learning with Nonstationary Sparse Transition2024-09-05 Machine learning models for daily rainfall forecasting in Northern Tropical Africa using tropical wave predictors2024-08-29 A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality2024-07-17 AdaGrad under Anisotropic Smoothness2024-06-21 Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization2024-06-07