Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

2022-05-11parameter-efficient fine-tuning Few-Shot Text Classification

Abstract

Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.

Results

Task	Dataset	Metric	Value	Model
Text Classification	RAFT	Over	0.95	T-Few
Text Classification	RAFT	ADE	0.804	T-Few
Text Classification	RAFT	Avg	0.758	T-Few
Text Classification	RAFT	B77	0.695	T-Few
Text Classification	RAFT	NIS	0.833	T-Few
Text Classification	RAFT	OSE	0.676	T-Few
Text Classification	RAFT	SOT	0.915	T-Few
Text Classification	RAFT	SRI	0.508	T-Few
Text Classification	RAFT	TAI	0.736	T-Few
Text Classification	RAFT	TC	0.879	T-Few
Text Classification	RAFT	TEH	0.586	T-Few
Text Classification	RAFT	ToS	0.75	T-Few
Few-Shot Text Classification	RAFT	Over	0.95	T-Few
Few-Shot Text Classification	RAFT	ADE	0.804	T-Few
Few-Shot Text Classification	RAFT	Avg	0.758	T-Few
Few-Shot Text Classification	RAFT	B77	0.695	T-Few
Few-Shot Text Classification	RAFT	NIS	0.833	T-Few
Few-Shot Text Classification	RAFT	OSE	0.676	T-Few
Few-Shot Text Classification	RAFT	SOT	0.915	T-Few
Few-Shot Text Classification	RAFT	SRI	0.508	T-Few
Few-Shot Text Classification	RAFT	TAI	0.736	T-Few
Few-Shot Text Classification	RAFT	TC	0.879	T-Few
Few-Shot Text Classification	RAFT	TEH	0.586	T-Few
Few-Shot Text Classification	RAFT	ToS	0.75	T-Few
Classification	RAFT	Over	0.95	T-Few
Classification	RAFT	ADE	0.804	T-Few
Classification	RAFT	Avg	0.758	T-Few
Classification	RAFT	B77	0.695	T-Few
Classification	RAFT	NIS	0.833	T-Few
Classification	RAFT	OSE	0.676	T-Few
Classification	RAFT	SOT	0.915	T-Few
Classification	RAFT	SRI	0.508	T-Few
Classification	RAFT	TAI	0.736	T-Few
Classification	RAFT	TC	0.879	T-Few
Classification	RAFT	TEH	0.586	T-Few
Classification	RAFT	ToS	0.75	T-Few

Abstract

Results

Task	Dataset	Metric	Value	Model
Text Classification	RAFT	Over	0.95	T-Few
Text Classification	RAFT	ADE	0.804	T-Few
Text Classification	RAFT	Avg	0.758	T-Few
Text Classification	RAFT	B77	0.695	T-Few
Text Classification	RAFT	NIS	0.833	T-Few
Text Classification	RAFT	OSE	0.676	T-Few
Text Classification	RAFT	SOT	0.915	T-Few
Text Classification	RAFT	SRI	0.508	T-Few
Text Classification	RAFT	TAI	0.736	T-Few
Text Classification	RAFT	TC	0.879	T-Few
Text Classification	RAFT	TEH	0.586	T-Few
Text Classification	RAFT	ToS	0.75	T-Few
Few-Shot Text Classification	RAFT	Over	0.95	T-Few
Few-Shot Text Classification	RAFT	ADE	0.804	T-Few
Few-Shot Text Classification	RAFT	Avg	0.758	T-Few
Few-Shot Text Classification	RAFT	B77	0.695	T-Few
Few-Shot Text Classification	RAFT	NIS	0.833	T-Few
Few-Shot Text Classification	RAFT	OSE	0.676	T-Few
Few-Shot Text Classification	RAFT	SOT	0.915	T-Few
Few-Shot Text Classification	RAFT	SRI	0.508	T-Few
Few-Shot Text Classification	RAFT	TAI	0.736	T-Few
Few-Shot Text Classification	RAFT	TC	0.879	T-Few
Few-Shot Text Classification	RAFT	TEH	0.586	T-Few
Few-Shot Text Classification	RAFT	ToS	0.75	T-Few
Classification	RAFT	Over	0.95	T-Few
Classification	RAFT	ADE	0.804	T-Few
Classification	RAFT	Avg	0.758	T-Few
Classification	RAFT	B77	0.695	T-Few
Classification	RAFT	NIS	0.833	T-Few
Classification	RAFT	OSE	0.676	T-Few
Classification	RAFT	SOT	0.915	T-Few
Classification	RAFT	SRI	0.508	T-Few
Classification	RAFT	TAI	0.736	T-Few
Classification	RAFT	TC	0.879	T-Few
Classification	RAFT	TEH	0.586	T-Few
Classification	RAFT	ToS	0.75	T-Few

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Abstract

Results

Related Papers

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Abstract

Results

Related Papers