TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/R-Drop: Regularized Dropout for Neural Networks

R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

2021-06-28NeurIPS 2021 12Machine TranslationImage ClassificationAbstractive Text SummarizationTranslationLanguage Modelling
PaperPDFCodeCodeCodeCodeCode(official)CodeCodeCode

Abstract

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{https://github.com/dropreg/R-Drop}}.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2014 German-EnglishBLEU score37.9Transformer + R-Drop + Cutoff
Machine TranslationIWSLT2014 German-EnglishBLEU score37.25Transformer + R-Drop
Machine TranslationWMT2014 English-GermanBLEU score30.91Transformer + R-Drop
Machine TranslationWMT2014 English-FrenchBLEU score43.95Transformer + R-Drop
Text SummarizationCNN / Daily MailROUGE-144.51BART + R-Drop
Text SummarizationCNN / Daily MailROUGE-221.58BART + R-Drop
Text SummarizationCNN / Daily MailROUGE-L41.24BART + R-Drop
Abstractive Text SummarizationCNN / Daily MailROUGE-144.51BART + R-Drop
Abstractive Text SummarizationCNN / Daily MailROUGE-221.58BART + R-Drop
Abstractive Text SummarizationCNN / Daily MailROUGE-L41.24BART + R-Drop

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17