TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Muppet: Massive Multi-task Representations with Pre-Finetu...

Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

2021-01-26EMNLP 2021 11Question AnsweringSentence CompletionSentiment AnalysisAbstractive Text SummarizationText SummarizationNatural Language InferenceCommon Sense ReasoningMulti-Task LearningLanguage Modelling
PaperPDFCode(official)Code(official)

Abstract

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Results

TaskDatasetMetricValueModel
Question AnsweringBoolQAccuracy87.5MUPPET Roberta Large
Question AnsweringBoolQAccuracy83.8MUPPET Roberta Base
Common Sense ReasoningCommonsenseQAAccuracy79.2MUPPET Roberta Large
Sentiment AnalysisSST-2 Binary classificationAccuracy97.4MUPPET Roberta Large
Sentiment AnalysisSST-2 Binary classificationAccuracy96.7MUPPET Roberta base
Text SummarizationReddit TIFUROUGE-130.3MUPPET BART Large
Text SummarizationReddit TIFUROUGE-211.25MUPPET BART Large
Text SummarizationReddit TIFUROUGE-L24.92MUPPET BART Large
Text SummarizationGigaWordROUGE-140.4MUPPET BART Large
Text SummarizationGigaWordROUGE-220.54MUPPET BART Large
Text SummarizationGigaWordROUGE-L36.21MUPPET BART Large
Text SummarizationCNN / Daily MailROUGE-144.45MUPPET BART Large
Text SummarizationCNN / Daily MailROUGE-221.25MUPPET BART Large
Text SummarizationCNN / Daily MailROUGE-L41.4MUPPET BART Large
Abstractive Text SummarizationCNN / Daily MailROUGE-144.45MUPPET BART Large
Abstractive Text SummarizationCNN / Daily MailROUGE-221.25MUPPET BART Large
Abstractive Text SummarizationCNN / Daily MailROUGE-L41.4MUPPET BART Large
Sentence CompletionHellaSwagAccuracy86.4MUPPET Roberta Large

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17