LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji

2023-04-27Question Answering Sentence Completion Coreference Resolution Natural Language Inference Common Sense Reasoning Word Sense Disambiguation Language Modelling

Paper PDF Code(official)

Abstract

Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.

Results

Task	Dataset	Metric	Value	Model
Question Answering	PIQA	Accuracy	72.2	FLAN-T5-Large 783M
Question Answering	PIQA	Accuracy	71.3	LaMini-GPT 1.5B
Question Answering	PIQA	Accuracy	70.6	LaMini-F-T5 783M
Question Answering	PIQA	Accuracy	70.5	GPT-2-XL 1.5B
Question Answering	PIQA	Accuracy	67.2	LaMini-T5 738M
Question Answering	PIQA	Accuracy	55.9	T5-Large 738M
Question Answering	OpenBookQA	Accuracy	39.8	LaMini-GPT 1.5B
Question Answering	OpenBookQA	Accuracy	36	LaMini-T5 738M
Question Answering	OpenBookQA	Accuracy	34	LaMini-F-T5 783M
Question Answering	OpenBookQA	Accuracy	32.8	T5-Large 738M
Question Answering	OpenBookQA	Accuracy	32	GPT-2-XL 1.5B
Question Answering	OpenBookQA	Accuracy	31.2	FLAN-T5-Large 783M
Common Sense Reasoning	WinoGrande	Accuracy	59.9	FLAN-T5-Large 783M
Common Sense Reasoning	WinoGrande	Accuracy	58.3	GPT-2-XL 1.5B
Common Sense Reasoning	WinoGrande	Accuracy	56	LaMini-F-T5 783M
Common Sense Reasoning	WinoGrande	Accuracy	56	LaMini-GPT 1.5B
Common Sense Reasoning	WinoGrande	Accuracy	55.2	T5-Large 738M
Common Sense Reasoning	WinoGrande	Accuracy	54.9	LaMini-T5 738M
Word Sense Disambiguation	Words in Context	Accuracy	64.7	FLAN-T5-Large 783M
Word Sense Disambiguation	Words in Context	Accuracy	63.8	LaMini-F-T5 783M
Word Sense Disambiguation	Words in Context	Accuracy	52.4	LaMini-GPT 1.5B
Word Sense Disambiguation	Words in Context	Accuracy	50.5	LaMini-T5 738M
Word Sense Disambiguation	Words in Context	Accuracy	49.8	GPT-2-XL 1.5B
Natural Language Inference	MultiNLI	Matched	72.4	T5-Large 738M
Natural Language Inference	MultiNLI	Mismatched	72	T5-Large 738M
Natural Language Inference	MultiNLI	Matched	67.5	LaMini-GPT 1.5B
Natural Language Inference	MultiNLI	Mismatched	69.3	LaMini-GPT 1.5B
Natural Language Inference	MultiNLI	Matched	61.4	LaMini-F-T5 783M
Natural Language Inference	MultiNLI	Mismatched	61	LaMini-F-T5 783M
Natural Language Inference	MultiNLI	Matched	54.7	LaMini-T5 738M
Natural Language Inference	MultiNLI	Mismatched	55.8	LaMini-T5 738M
Natural Language Inference	MultiNLI	Matched	36.5	GPT-2-XL 1.5B
Natural Language Inference	MultiNLI	Mismatched	37	GPT-2-XL 1.5B
Coreference Resolution	Winograd Schema Challenge	Accuracy	73.3	GPT-2-XL 1.5B
Coreference Resolution	Winograd Schema Challenge	Accuracy	69.6	LaMini-GPT 1.5B
Coreference Resolution	Winograd Schema Challenge	Accuracy	66.7	T5-Large 738M
Coreference Resolution	Winograd Schema Challenge	Accuracy	64.1	LaMini-F-T5 783M
Coreference Resolution	Winograd Schema Challenge	Accuracy	59	LaMini-T5 738M
Sentence Completion	HellaSwag	Accuracy	50.9	GPT-2-XL 1.5B
Sentence Completion	HellaSwag	Accuracy	48.7	FLAN-T5-Large 783M
Sentence Completion	HellaSwag	Accuracy	48.3	LaMini-GPT 1.5B
Sentence Completion	HellaSwag	Accuracy	43.7	LaMini-F-T5 783M
Sentence Completion	HellaSwag	Accuracy	40.6	LaMini-T5 738M
Sentence Completion	HellaSwag	Accuracy	38.9	T5-Large 738M

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Abstract

Results

Related Papers

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Abstract

Results

Related Papers