TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Weight Normalization

Weight Normalization

GeneralIntroduced 200088 papers
Source Paper

Description

Weight Normalization is a normalization method for training neural networks. It is inspired by batch normalization, but it is a deterministic method that does not share batch normalization's property of adding noise to the gradients. It reparameterizes each kkk-dimentional weight vector w\textbf{w}w in terms of a parameter vector v\textbf{v}v and a scalar parameter ggg and to perform stochastic gradient descent with respect to those parameters instead. Weight vectors are expressed in terms of the new parameters using:

w=g∥textbfv∥v \textbf{w} = \frac{g}{\Vert\\textbf{v}\Vert}\textbf{v}w=∥textbfv∥g​v

where v\textbf{v}v is a kkk-dimensional vector, ggg is a scalar, and ∥v∥\Vert\textbf{v}\Vert∥v∥ denotes the Euclidean norm of v\textbf{v}v. This reparameterization has the effect of fixing the Euclidean norm of the weight vector w\textbf{w}w: we now have ∥w∥=g\Vert\textbf{w}\Vert = g∥w∥=g, independent of the parameters v\textbf{v}v.

Papers Using This Method

Bio-Inspired Plastic Neural Networks for Zero-Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robots2025-03-16Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization2025-02-11p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay2024-12-05AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach2024-10-12Optimization and Generalization Guarantees for Weight Normalization2024-09-13Weight Conditioning for Smooth Optimization of Neural Networks2024-09-05Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks2024-07-24Blood Glucose Control Via Pre-trained Counterfactual Invertible Neural Networks2024-05-23PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning2024-05-10Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization2024-04-29XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts2024-04-23A2Q+: Improving Accumulator-Aware Weight Quantization2024-01-19Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities2023-10-05Gradient-Based Feature Learning under Structured Data2023-09-07A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023-08-25Robust Implicit Regularization via Weight Normalization2023-05-09Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance2023-01-31Clarinet: A Music Retrieval System2022-10-23A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck2022-07-27WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis2022-06-20