GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

Mohsen Gholami, Mohammad Akbari, Cindy Hu, Vaden Masrani, Z. Jane Wang, Yong Zhang

2024-03-28Data-free Knowledge Distillation Knowledge Distillation

Abstract

Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.

Results

Task	Dataset	Metric	Value	Model
Knowledge Distillation	SQuAD	Exact Match	75.2	GOLD (T5-base)
Knowledge Distillation	QNLI	Accuracy	91.7	GOLD (T5-base)
Data-free Knowledge Distillation	SQuAD	Exact Match	75.2	GOLD (T5-base)
Data-free Knowledge Distillation	QNLI	Accuracy	91.7	GOLD (T5-base)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15 Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14 KAT-V1: Kwai-AutoThink Technical Report2025-07-11 Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11 SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation2025-07-11