Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

2020-08-06ICLR 2021 1Abstractive Text Summarization Text Summarization Cross-Lingual Natural Language Inference

Abstract

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

Results

Task	Dataset	Metric	Value	Model
Text Summarization	Reddit TIFU	ROUGE-1	30.31	BART+R3F
Text Summarization	Reddit TIFU	ROUGE-2	10.98	BART+R3F
Text Summarization	Reddit TIFU	ROUGE-L	24.74	BART+R3F
Text Summarization	GigaWord	ROUGE-1	40.45	BART-RXF
Text Summarization	GigaWord	ROUGE-2	20.69	BART-RXF
Text Summarization	GigaWord	ROUGE-L	36.56	BART-RXF
Text Summarization	CNN / Daily Mail	ROUGE-1	44.38	BART+R3F
Text Summarization	CNN / Daily Mail	ROUGE-2	21.53	BART+R3F
Text Summarization	CNN / Daily Mail	ROUGE-L	41.17	BART+R3F
Abstractive Text Summarization	CNN / Daily Mail	ROUGE-1	44.38	BART+R3F
Abstractive Text Summarization	CNN / Daily Mail	ROUGE-2	21.53	BART+R3F
Abstractive Text Summarization	CNN / Daily Mail	ROUGE-L	41.17	BART+R3F

Better Fine-Tuning by Reducing Representational Collapse

Abstract

Results

Related Papers

Better Fine-Tuning by Reducing Representational Collapse

Abstract

Results

Related Papers