Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SGD

Stochastic Gradient Descent

GeneralIntroduced 19512021 papers

Description

Stochastic Gradient Descent is an iterative optimization technique that uses minibatches of data to form an expectation of the gradient, rather than the full gradient using all available data. That is for weights $w$ and a loss function $L$ we have:

$w\_{t+1} = w\_{t} - \eta\hat{\nabla}\_{w}{L(w\_{t})}$

Where $\eta$ is a learning rate. SGD reduces redundancy compared to batch gradient descent - which recomputes gradients for similar examples before each parameter update - so it is usually much faster.

(Image Source: here)

Papers Using This Method

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime2025-07-15 A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning2025-07-09 Tight Generalization Error Bounds for Stochastic Gradient Descent in Non-convex Learning2025-06-23 A Minimalist Optimizer Design for LLM Pretraining2025-06-20 A Simplified Analysis of SGD for Linear Regression with Weight Averaging2025-06-18 Sharpness-Aware Machine Unlearning2025-06-16 Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling2025-06-14 Learning single-index models via harmonic decomposition2025-06-11 An Adaptive Method Stabilizing Activations for Enhanced Generalization2025-06-10 Improved Scaling Laws in Linear Regression via Data Reuse2025-06-10 Online Learning-guided Learning Rate Adaptation via Gradient Alignment2025-06-10 Orthogonal Gradient Descent Improves Neural Calibration2025-06-04 Classifying Dental Care Providers Through Machine Learning with Features Ranking2025-06-04 Replay Can Provably Increase Forgetting2025-06-04 Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order2025-06-04 Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems2025-06-04 Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks2025-06-03 LightSAM: Parameter-Agnostic Sharpness-Aware Minimization2025-05-30 SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training2025-05-29 The Rich and the Simple: On the Implicit Bias of Adam and SGD2025-05-29