Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Layer Normalization

Layer Normalization

GeneralIntroduced 200024985 papers

Description

Unlike batch normalization, Layer Normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer so the normalization does not introduce any new dependencies between training cases. It works well for RNNs and improves both the training time and the generalization performance of several existing RNN models. More recently, it has been used with Transformer models.

We compute the layer normalization statistics over all the hidden units in the same layer as follows:

$\mu^{l} = \frac{1}{H}\sum^{H}\_{i=1}a\_{i}^{l}$

$\sigma^{l} = \sqrt{\frac{1}{H}\sum^{H}\_{i=1}\left(a\_{i}^{l}-\mu^{l}\right)^{2}}$

where $H$ denotes the number of hidden units in a layer. Under layer normalization, all the hidden units in a layer share the same normalization terms $\mu$ and $\sigma$ , but different training cases have different normalization terms. Unlike batch normalization, layer normalization does not impose any constraint on the size of the mini-batch and it can be used in the pure online regime with batch size 1.

Papers Using This Method

Making Language Model a Hierarchical Classifier and Generator2025-07-17 DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17 Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 Langevin Flows for Modeling Neural Latent Dynamics2025-07-15 Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15 Biological Processing Units: Leveraging an Insect Connectome to Pioneer Biofidelic Neural Architectures2025-07-15 KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding2025-07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15 Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI2025-07-13 Learning from Synthetic Labs: Language Models as Auction Participants2025-07-12 Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays2025-07-11 Chat-Ghosting: A Comparative Study of Methods for Auto-Completion in Dialog Systems2025-07-08 SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression2025-07-08 Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving2025-07-08 Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS2025-07-08 Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification2025-07-08 A Wireless Foundation Model for Multi-Task Prediction2025-07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate2025-07-08 SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model2025-07-07