TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Gradient Clipping

Gradient Clipping

GeneralIntroduced 2000167 papers

Description

One difficulty that arises with optimization of deep neural networks is that large parameter gradients can lead an SGD optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work that was needed to get to the current solution.

Gradient Clipping clips the size of the gradients to ensure optimization performs more reasonably near sharp areas of the loss surface. It can be performed in a number of ways. One option is to simply clip the parameter gradient element-wise before a parameter update. Another option is to clip the norm ||g\textbf{g}g|| of the gradient g\textbf{g}g before a parameter update:

 if ∣∣g∣∣>v then g←gv∣∣g∣∣\text{ if } ||\textbf{g}|| > v \text{ then } \textbf{g} \leftarrow \frac{\textbf{g}{v}}{||\textbf{g}||} if ∣∣g∣∣>v then g←∣∣g∣∣gv​

where vvv is a norm threshold.

Source: Deep Learning, Goodfellow et al

Image Source: Pascanu et al

Papers Using This Method

Differentially Private Relational Learning with Entity-level Privacy Guarantees2025-06-10GeoClip: Geometry-Aware Clipping for Differentially Private SGD2025-06-06GCFL: A Gradient Correction-based Federated Learning Framework for Privacy-preserving CPSS2025-06-04Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation2025-05-20A Training Framework for Optimal and Stable Training of Polynomial Neural Networks2025-05-16Dyn-D$^2$P: Dynamic Differentially Private Decentralized Learning with Provable Utility Guarantee2025-05-10Can Local Representation Alignment RNNs Solve Temporal Tasks?2025-04-18Technical Report: Full Version of Analyzing and Optimizing Perturbation of DP-SGD Geometrically2025-04-08ZClip: Adaptive Spike Mitigation for LLM Pre-Training2025-04-03World Model Agents with Change-Based Intrinsic Motivation2025-03-26Tractable Representations for Convergent Approximation of Distributional HJB Equations2025-03-07AdaGC: Improving Training Stability for Large Language Model Pretraining2025-02-16Local Differential Privacy is Not Enough: A Sample Reconstruction Attack against Federated Learning with Local Differential Privacy2025-02-12Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs2025-02-07BMG-Q: Localized Bipartite Match Graph Attention Q-Learning for Ride-Pooling Order Dispatch2025-01-23Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions2025-01-08On the Convergence of DP-SGD with Adaptive Clipping2024-12-27Optimized Gradient Clipping for Noisy Label Learning2024-12-12A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise2024-10-21