Fixup Initialization

GeneralIntroduced 20002 papers

Description

FixUp Initialization, or Fixed-Update Initialization, is an initialization method that rescales the standard initialization of residual branches by adjusting for the network architecture. Fixup aims to enables training very deep residual networks stably at a maximal learning rate without normalization.

The steps are as follows:

  1. Initialize the classification layer and the last layer of each residual branch to 0.

  2. Initialize every other layer using a standard method, e.g. Kaiming Initialization, and scale only the weight layers inside residual branches by L12m2L^{\frac{1}{2m-2}}.

  3. Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer.

Papers Using This Method