Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

A Reversible Residual Network, or RevNet, is a variant of a ResNet where each layer’s activations can be reconstructed exactly from the next layer’s. Therefore, the activations for most layers need not be stored in memory during backpropagation. The result is a network architecture whose activation storage requirements are independent of depth, and typically at least an order of magnitude smaller compared with equally sized ResNets.

RevNets are composed of a series of reversible blocks. Units in each layer are partitioned into two groups, denoted $x\_{1}$ and $x\_{2}$ ; the authors find what works best is partitioning the channels. Each reversible block takes inputs $\left(x\_{1}, x\_{2}\right)$ and produces outputs $\left(y\_{1}, y\_{2}\right)$ according to the following additive coupling rules – inspired the transformation in NICE (nonlinear independent components estimation) – and residual functions $F$ and $G$ analogous to those in standard ResNets:

$y\_{1} = x\_{1} + F\left(x\_{2}\right)$ $y\_{2} = x\_{2} + G\left(y\_{1}\right)$

Each layer’s activations can be reconstructed from the next layer’s activations as follows:

$x\_{2} = y\_{2} − G\left(y\_{1}\right)$ $x\_{1} = y\_{1} − F\left(x\_{2}\right)$

Note that unlike residual blocks, reversible blocks must have a stride of 1 because otherwise the layer discards information, and therefore cannot be reversible. Standard ResNet architectures typically have a handful of layers with a larger stride. If we define a RevNet architecture analogously, the activations must be stored explicitly for all non-reversible layers.

Description

$y\_{1} = x\_{1} + F\left(x\_{2}\right)$ $y\_{2} = x\_{2} + G\left(y\_{1}\right)$

Each layer’s activations can be reconstructed from the next layer’s activations as follows:

$x\_{2} = y\_{2} − G\left(y\_{1}\right)$ $x\_{1} = y\_{1} − F\left(x\_{2}\right)$

RevNet

Description

Papers Using This Method

RevNet

Description

Papers Using This Method