TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking Soft Labels for Knowledge Distillation: A Bias-...

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang

2021-02-01Knowledge Distillation
PaperPDFCodeCodeCodeCode(official)

Abstract

Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for supervising the training of a new network. Recent studies \citep{muller2019does,yuan2020revisiting} revealed an intriguing property of the soft labels that making labels soft serves as a good regularization to the student network. From the perspective of statistical learning, regularization aims to reduce the variance, however how bias and variance change is not clear for training with soft labels. In this paper, we investigate the bias-variance tradeoff brought by distillation with soft labels. Specifically, we observe that during training the bias-variance tradeoff varies sample-wisely. Further, under the same distillation temperature setting, we observe that the distillation performance is negatively associated with the number of some specific samples, which are named as regularization samples since these samples lead to bias increasing and variance decreasing. Nevertheless, we empirically find that completely filtering out regularization samples also deteriorates distillation performance. Our discoveries inspired us to propose the novel weighted soft labels to help the network adaptively handle the sample-wise bias-variance tradeoff. Experiments on standard evaluation benchmarks validate the effectiveness of our method. Our code is available at \url{https://github.com/bellymonster/Weighted-Soft-Label-Distillation}.

Results

TaskDatasetMetricValueModel
Knowledge DistillationImageNetTop-1 accuracy %72.04WSL (T: ResNet-34 S:ResNet-18)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14KAT-V1: Kwai-AutoThink Technical Report2025-07-11Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation2025-07-11