Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Gradient Clipping

Gradient Clipping

GeneralIntroduced 2000167 papers

Description

One difficulty that arises with optimization of deep neural networks is that large parameter gradients can lead an SGD optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work that was needed to get to the current solution.

Gradient Clipping clips the size of the gradients to ensure optimization performs more reasonably near sharp areas of the loss surface. It can be performed in a number of ways. One option is to simply clip the parameter gradient element-wise before a parameter update. Another option is to clip the norm || $\textbf{g}$ || of the gradient $\textbf{g}$ before a parameter update:

$\text{ if } ||\textbf{g}|| > v \text{ then } \textbf{g} \leftarrow \frac{\textbf{g}{v}}{||\textbf{g}||}$

where $v$ is a norm threshold.

Source: Deep Learning, Goodfellow et al

Image Source: Pascanu et al

Papers Using This Method

Differentially Private Relational Learning with Entity-level Privacy Guarantees2025-06-10 GeoClip: Geometry-Aware Clipping for Differentially Private SGD2025-06-06 GCFL: A Gradient Correction-based Federated Learning Framework for Privacy-preserving CPSS2025-06-04 Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation2025-05-20 A Training Framework for Optimal and Stable Training of Polynomial Neural Networks2025-05-16 Dyn-D$^2$P: Dynamic Differentially Private Decentralized Learning with Provable Utility Guarantee2025-05-10 Can Local Representation Alignment RNNs Solve Temporal Tasks?2025-04-18 Technical Report: Full Version of Analyzing and Optimizing Perturbation of DP-SGD Geometrically2025-04-08 ZClip: Adaptive Spike Mitigation for LLM Pre-Training2025-04-03 World Model Agents with Change-Based Intrinsic Motivation2025-03-26 Tractable Representations for Convergent Approximation of Distributional HJB Equations2025-03-07 AdaGC: Improving Training Stability for Large Language Model Pretraining2025-02-16 Local Differential Privacy is Not Enough: A Sample Reconstruction Attack against Federated Learning with Local Differential Privacy2025-02-12 Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs2025-02-07 BMG-Q: Localized Bipartite Match Graph Attention Q-Learning for Ride-Pooling Order Dispatch2025-01-23 Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions2025-01-08 On the Convergence of DP-SGD with Adaptive Clipping2024-12-27 Optimized Gradient Clipping for Noisy Label Learning2024-12-12 A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19 Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise2024-10-21