TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Forward gradient

Forward gradient

GeneralIntroduced 200012 papers
Source Paper

Description

Forward gradients are unbiased estimators of the gradient ∇f(θ)\nabla f(\theta)∇f(θ) for a function f:Rn→Rf: \mathbb{R}^n \rightarrow \mathbb{R}f:Rn→R, given by g(θ)=⟨∇f(θ),v⟩vg(\theta) = \langle \nabla f(\theta) , v \rangle vg(θ)=⟨∇f(θ),v⟩v.

Here v=(v1,…,vn)v = (v_1, \ldots, v_n)v=(v1​,…,vn​) is a random vector, which must satisfy the following conditions in order for g(θ)g(\theta)g(θ) to be an unbiased estimator of ∇f(θ)\nabla f(\theta)∇f(θ)

  • vi⊥vjv_i \perp v_jvi​⊥vj​ for all i≠ji \neq ji=j
  • E[vi]=0\mathbb{E}[v_i] = 0E[vi​]=0 for all iii
  • V[vi]=1\mathbb{V}[v_i] = 1V[vi​]=1 for all iii

Forward gradients can be computed with a single jvp (Jacobian Vector Product), which enables the use of the forward mode of autodifferentiation instead of the usual reverse mode, which has worse computational characteristics.

Papers Using This Method

A Scalable Hybrid Training Approach for Recurrent Spiking Neural Networks2025-06-17Flexible and Efficient Surrogate Gradient Modeling with Forward Gradient Injection2024-05-31Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction2024-03-19Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark2024-02-18ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update2024-02-01Convergence guarantees for forward gradient descent in the linear regression model2023-09-26Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism2023-09-21Can Forward Gradient Match Backpropagation?2023-06-12Low-Variance Forward Gradients using Direct Feedback Alignment and Momentum2022-12-14Scaling Forward Gradient With Local Losses2022-10-07Optimization without Backpropagation2022-09-13Gradients without Backpropagation2022-02-17