LayerDrop

GeneralIntroduced 20004 papers

Description

LayerDrop is a form of structured dropout for Transformer models which has a regularization effect during training and allows for efficient pruning at inference time. It randomly drops layers from the Transformer according to an "every other" strategy where pruning with a rate $p$ means dropping the layers at depth $d$ such that $d = 0\left\(\text{mod}\left(\text{floor}\left(\frac{1}{p}\right)\right)\right)$ .

Papers Using This Method

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding2024-04-25 Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention2022-11-21 Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation2020-10-16 Reducing Transformer Depth on Demand with Structured Dropout2019-09-25