Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta
We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | BoolQ | Accuracy | 87.5 | MUPPET Roberta Large |
| Question Answering | BoolQ | Accuracy | 83.8 | MUPPET Roberta Base |
| Common Sense Reasoning | CommonsenseQA | Accuracy | 79.2 | MUPPET Roberta Large |
| Sentiment Analysis | SST-2 Binary classification | Accuracy | 97.4 | MUPPET Roberta Large |
| Sentiment Analysis | SST-2 Binary classification | Accuracy | 96.7 | MUPPET Roberta base |
| Text Summarization | Reddit TIFU | ROUGE-1 | 30.3 | MUPPET BART Large |
| Text Summarization | Reddit TIFU | ROUGE-2 | 11.25 | MUPPET BART Large |
| Text Summarization | Reddit TIFU | ROUGE-L | 24.92 | MUPPET BART Large |
| Text Summarization | GigaWord | ROUGE-1 | 40.4 | MUPPET BART Large |
| Text Summarization | GigaWord | ROUGE-2 | 20.54 | MUPPET BART Large |
| Text Summarization | GigaWord | ROUGE-L | 36.21 | MUPPET BART Large |
| Text Summarization | CNN / Daily Mail | ROUGE-1 | 44.45 | MUPPET BART Large |
| Text Summarization | CNN / Daily Mail | ROUGE-2 | 21.25 | MUPPET BART Large |
| Text Summarization | CNN / Daily Mail | ROUGE-L | 41.4 | MUPPET BART Large |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-1 | 44.45 | MUPPET BART Large |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-2 | 21.25 | MUPPET BART Large |
| Abstractive Text Summarization | CNN / Daily Mail | ROUGE-L | 41.4 | MUPPET BART Large |
| Sentence Completion | HellaSwag | Accuracy | 86.4 | MUPPET Roberta Large |