TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/ReZero

ReZero

GeneralIntroduced 20007 papers
Source Paper

Description

ReZero is a normalization approach that dynamically facilitates well-behaved gradients and arbitrarily deep signal propagation. The idea is simple: ReZero initializes each layer to perform the identity operation. For each layer, a residual connection is introduced for the input signal xxx and one trainable parameter α\alphaα that modulates the non-trivial transformation of a layer F(x)F(\mathbf{x})F(x):

x_i+1=x_i+αiF(x_i)\mathbf{x}\_{i+1}=\mathbf{x}\_{i}+\alpha_{i} F\left(\mathbf{x}\_{i}\right)x_i+1=x_i+αi​F(x_i)

where α=0\alpha=0α=0 at the beginning of training. Initially the gradients for all parameters defining FFF vanish, but dynamically evolve to suitable values during initial stages of training. The architecture is illustrated in the Figure.

Papers Using This Method

ReZero: Enhancing LLM search ability by trying one-more-time2025-04-15ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze2024-04-25ReZero: Region-customizable Sound Extraction2023-08-31Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting2022-08-30Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets2021-03-12Transforming Recurrent Neural Networks with Attention and Fixed-point Equations2021-01-01ReZero is All You Need: Fast Convergence at Large Depth2020-03-10