TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Better Fine-Tuning by Reducing Representational Collapse

Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

2020-08-06ICLR 2021 1Abstractive Text SummarizationText SummarizationCross-Lingual Natural Language Inference
PaperPDFCodeCodeCode(official)

Abstract

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

Results

TaskDatasetMetricValueModel
Text SummarizationReddit TIFUROUGE-130.31BART+R3F
Text SummarizationReddit TIFUROUGE-210.98BART+R3F
Text SummarizationReddit TIFUROUGE-L24.74BART+R3F
Text SummarizationGigaWordROUGE-140.45BART-RXF
Text SummarizationGigaWordROUGE-220.69BART-RXF
Text SummarizationGigaWordROUGE-L36.56BART-RXF
Text SummarizationCNN / Daily MailROUGE-144.38BART+R3F
Text SummarizationCNN / Daily MailROUGE-221.53BART+R3F
Text SummarizationCNN / Daily MailROUGE-L41.17BART+R3F
Abstractive Text SummarizationCNN / Daily MailROUGE-144.38BART+R3F
Abstractive Text SummarizationCNN / Daily MailROUGE-221.53BART+R3F
Abstractive Text SummarizationCNN / Daily MailROUGE-L41.17BART+R3F

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention2025-06-11Improving large language models with concept-aware fine-tuning2025-06-09Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs2025-06-03ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection2025-05-29APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization2025-05-26FiLLM -- A Filipino-optimized Large Language Model based on Southeast Asia Large Language Model (SEALLM)2025-05-25