Forward gradient

GeneralIntroduced 200012 papers

Description

Forward gradients are unbiased estimators of the gradient $\nabla f(\theta)$ for a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ , given by $g(\theta) = \langle \nabla f(\theta) , v \rangle v$ .

Here $v = (v_1, \ldots, v_n)$ is a random vector, which must satisfy the following conditions in order for $g(\theta)$ to be an unbiased estimator of $\nabla f(\theta)$

$v_i \perp v_j$ for all $i \neq j$
$\mathbb{E}[v_i] = 0$ for all $i$
$\mathbb{V}[v_i] = 1$ for all $i$

Forward gradients can be computed with a single jvp (Jacobian Vector Product), which enables the use of the forward mode of autodifferentiation instead of the usual reverse mode, which has worse computational characteristics.

Papers Using This Method

A Scalable Hybrid Training Approach for Recurrent Spiking Neural Networks2025-06-17 Flexible and Efficient Surrogate Gradient Modeling with Forward Gradient Injection2024-05-31 Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction2024-03-19 Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark2024-02-18 ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update2024-02-01 Convergence guarantees for forward gradient descent in the linear regression model2023-09-26 Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism2023-09-21 Can Forward Gradient Match Backpropagation?2023-06-12 Low-Variance Forward Gradients using Direct Feedback Alignment and Momentum2022-12-14 Scaling Forward Gradient With Local Losses2022-10-07 Optimization without Backpropagation2022-09-13 Gradients without Backpropagation2022-02-17

Forward gradient

GeneralIntroduced 200012 papers

Source Paper

Description

Forward gradients are unbiased estimators of the gradient $\nabla f(\theta)$ for a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ , given by $g(\theta) = \langle \nabla f(\theta) , v \rangle v$ .

Here $v = (v_1, \ldots, v_n)$ is a random vector, which must satisfy the following conditions in order for $g(\theta)$ to be an unbiased estimator of $\nabla f(\theta)$

$v_i \perp v_j$ for all $i \neq j$
$\mathbb{E}[v_i] = 0$ for all $i$
$\mathbb{V}[v_i] = 1$ for all $i$