TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LaMini-LM: A Diverse Herd of Distilled Models from Large-S...

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji

2023-04-27Question AnsweringSentence CompletionCoreference ResolutionNatural Language InferenceCommon Sense ReasoningWord Sense DisambiguationLanguage Modelling
PaperPDFCode(official)

Abstract

Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.

Results

TaskDatasetMetricValueModel
Question AnsweringPIQAAccuracy72.2FLAN-T5-Large 783M
Question AnsweringPIQAAccuracy71.3LaMini-GPT 1.5B
Question AnsweringPIQAAccuracy70.6LaMini-F-T5 783M
Question AnsweringPIQAAccuracy70.5GPT-2-XL 1.5B
Question AnsweringPIQAAccuracy67.2LaMini-T5 738M
Question AnsweringPIQAAccuracy55.9T5-Large 738M
Question AnsweringOpenBookQAAccuracy39.8LaMini-GPT 1.5B
Question AnsweringOpenBookQAAccuracy36LaMini-T5 738M
Question AnsweringOpenBookQAAccuracy34LaMini-F-T5 783M
Question AnsweringOpenBookQAAccuracy32.8T5-Large 738M
Question AnsweringOpenBookQAAccuracy32GPT-2-XL 1.5B
Question AnsweringOpenBookQAAccuracy31.2FLAN-T5-Large 783M
Common Sense ReasoningWinoGrandeAccuracy59.9FLAN-T5-Large 783M
Common Sense ReasoningWinoGrandeAccuracy58.3GPT-2-XL 1.5B
Common Sense ReasoningWinoGrandeAccuracy56LaMini-F-T5 783M
Common Sense ReasoningWinoGrandeAccuracy56LaMini-GPT 1.5B
Common Sense ReasoningWinoGrandeAccuracy55.2T5-Large 738M
Common Sense ReasoningWinoGrandeAccuracy54.9LaMini-T5 738M
Word Sense DisambiguationWords in ContextAccuracy64.7FLAN-T5-Large 783M
Word Sense DisambiguationWords in ContextAccuracy63.8LaMini-F-T5 783M
Word Sense DisambiguationWords in ContextAccuracy52.4LaMini-GPT 1.5B
Word Sense DisambiguationWords in ContextAccuracy50.5LaMini-T5 738M
Word Sense DisambiguationWords in ContextAccuracy49.8GPT-2-XL 1.5B
Natural Language InferenceMultiNLIMatched72.4T5-Large 738M
Natural Language InferenceMultiNLIMismatched72T5-Large 738M
Natural Language InferenceMultiNLIMatched67.5LaMini-GPT 1.5B
Natural Language InferenceMultiNLIMismatched69.3LaMini-GPT 1.5B
Natural Language InferenceMultiNLIMatched61.4LaMini-F-T5 783M
Natural Language InferenceMultiNLIMismatched61LaMini-F-T5 783M
Natural Language InferenceMultiNLIMatched54.7LaMini-T5 738M
Natural Language InferenceMultiNLIMismatched55.8LaMini-T5 738M
Natural Language InferenceMultiNLIMatched36.5GPT-2-XL 1.5B
Natural Language InferenceMultiNLIMismatched37GPT-2-XL 1.5B
Coreference ResolutionWinograd Schema ChallengeAccuracy73.3GPT-2-XL 1.5B
Coreference ResolutionWinograd Schema ChallengeAccuracy69.6LaMini-GPT 1.5B
Coreference ResolutionWinograd Schema ChallengeAccuracy66.7T5-Large 738M
Coreference ResolutionWinograd Schema ChallengeAccuracy64.1LaMini-F-T5 783M
Coreference ResolutionWinograd Schema ChallengeAccuracy59LaMini-T5 738M
Sentence CompletionHellaSwagAccuracy50.9GPT-2-XL 1.5B
Sentence CompletionHellaSwagAccuracy48.7FLAN-T5-Large 783M
Sentence CompletionHellaSwagAccuracy48.3LaMini-GPT 1.5B
Sentence CompletionHellaSwagAccuracy43.7LaMini-F-T5 783M
Sentence CompletionHellaSwagAccuracy40.6LaMini-T5 738M
Sentence CompletionHellaSwagAccuracy38.9T5-Large 738M

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17