TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Ensemble Knowledge Distillation for Learning Improved and ...

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Umar Asif, Jianbin Tang, Stefan Harrer

2019-09-17Ensemble LearningGeneral ClassificationKnowledge Distillation
PaperPDFCodeCode

Abstract

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for learning compact CNN models with improved classification performance and model generalization. For this, we propose a CNN architecture of a compact student model with parallel branches which are trained using ground truth labels and information from high capacity teacher networks in an ensemble learning fashion. Our framework provides two main benefits: i) Distilling knowledge from different teachers into the student network promotes heterogeneity in feature learning at different branches of the student network and enables the network to learn diverse solutions to the target problem. ii) Coupling the branches of the student network through ensembling encourages collaboration and improves the quality of the final predictions by reducing variance in the network outputs. Experiments on the well established CIFAR-10 and CIFAR-100 datasets show that our Ensemble Knowledge Distillation (EKD) improves classification accuracy and model generalization especially in situations with limited training data. Experiments also show that our EKD based compact networks outperform in terms of mean accuracy on the test datasets compared to state-of-the-art knowledge distillation based methods.

Results

TaskDatasetMetricValueModel
Knowledge DistillationImageNetTop-1 accuracy %78.79ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)
Knowledge DistillationImageNetTop-1 accuracy %78.07ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)
Knowledge DistillationImageNetTop-1 accuracy %76.376ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14KAT-V1: Kwai-AutoThink Technical Report2025-07-11Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11