Description
LayerDrop is a form of structured dropout for Transformer models which has a regularization effect during training and allows for efficient pruning at inference time. It randomly drops layers from the Transformer according to an "every other" strategy where pruning with a rate means dropping the layers at depth such that d = 0\left\(\text{mod}\left(\text{floor}\left(\frac{1}{p}\right)\right)\right).
Papers Using This Method
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding2024-04-25Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention2022-11-21Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation2020-10-16Reducing Transformer Depth on Demand with Structured Dropout2019-09-25