Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Weight Normalization

Weight Normalization

GeneralIntroduced 200088 papers

Description

Weight Normalization is a normalization method for training neural networks. It is inspired by batch normalization, but it is a deterministic method that does not share batch normalization's property of adding noise to the gradients. It reparameterizes each $k$ -dimentional weight vector $\textbf{w}$ in terms of a parameter vector $\textbf{v}$ and a scalar parameter $g$ and to perform stochastic gradient descent with respect to those parameters instead. Weight vectors are expressed in terms of the new parameters using:

$\textbf{w} = \frac{g}{\Vert\\textbf{v}\Vert}\textbf{v}$

where $\textbf{v}$ is a $k$ -dimensional vector, $\textbf{w}$ is a scalar, and $\Vert\textbf{v}\Vert$ denotes the Euclidean norm of $\textbf{v}$ . This reparameterization has the effect of fixing the Euclidean norm of the weight vector $\textbf{w}$ : we now have $\Vert\textbf{w}\Vert = g$ , independent of the parameters $\textbf{v}$ .

Papers Using This Method

Bio-Inspired Plastic Neural Networks for Zero-Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robots2025-03-16 Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization2025-02-11 p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay2024-12-05 AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach2024-10-12 Optimization and Generalization Guarantees for Weight Normalization2024-09-13 Weight Conditioning for Smooth Optimization of Neural Networks2024-09-05 Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks2024-07-24 Blood Glucose Control Via Pre-trained Counterfactual Invertible Neural Networks2024-05-23 PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning2024-05-10 Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization2024-04-29 XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts2024-04-23 A2Q+: Improving Accumulator-Aware Weight Quantization2024-01-19 Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities2023-10-05 Gradient-Based Feature Learning under Structured Data2023-09-07 A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023-08-25 Robust Implicit Regularization via Weight Normalization2023-05-09 Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance2023-01-31 Clarinet: A Music Retrieval System2022-10-23 A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck2022-07-27 WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis2022-06-20