TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BART: Denoising Sequence-to-Sequence Pre-training for Natu...

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

2019-10-29ACL 2020 6DenoisingMachine TranslationQuestion AnsweringText GenerationAbstractive Text SummarizationText SummarizationNatural Language InferenceTranslationOpen-Domain Question Answering
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.

Results

TaskDatasetMetricValueModel
Question AnsweringSQuAD1.1 devF190.8BART Base (with text infilling)
Question AnsweringELI5Rouge-130.6BART
Question AnsweringELI5Rouge-26.2BART
Question AnsweringELI5Rouge-L24.3BART
Text SummarizationX-SumROUGE-145.14BART
Text SummarizationX-SumROUGE-222.27BART
Text SummarizationX-SumROUGE-337.25BART
Text SummarizationCNN / Daily MailROUGE-144.16BART
Text SummarizationCNN / Daily MailROUGE-221.28BART
Text SummarizationCNN / Daily MailROUGE-L40.9BART
Abstractive Text SummarizationCNN / Daily MailROUGE-144.16BART
Abstractive Text SummarizationCNN / Daily MailROUGE-221.28BART
Abstractive Text SummarizationCNN / Daily MailROUGE-L40.9BART
Open-Domain Question AnsweringELI5Rouge-130.6BART
Open-Domain Question AnsweringELI5Rouge-26.2BART
Open-Domain Question AnsweringELI5Rouge-L24.3BART

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17