TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Knowledge Distillation/ImageNet

Knowledge Distillation on ImageNet

Metric: Top-1 accuracy % (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Top-1 accuracy %▼Extra DataPaperDate↕Code
1ScaleKD (T:BEiT-L S:ViT-B/14)86.43NoScaleKD: Strong Vision Transformers Could Be Exc...2024-11-11Code
2ScaleKD (T:Swin-L S:ViT-B/16)85.53NoScaleKD: Strong Vision Transformers Could Be Exc...2024-11-11Code
3ScaleKD (T:Swin-L S:ViT-S/16)83.93NoScaleKD: Strong Vision Transformers Could Be Exc...2024-11-11Code
4ScaleKD (T:Swin-L S:Swin-T)83.8NoScaleKD: Strong Vision Transformers Could Be Exc...2024-11-11Code
5KD++(T: regnety-16GF S:ViT-B)83.6NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
6VkD (T:RegNety 160 S:DeiT-S)82.9No$V_kD:$ Improving Knowledge Distillation using O...2024-03-10Code
7SpectralKD (T:Swin-S S:Swin-T)82.7NoSpectralKD: A Unified Framework for Interpreting...2024-12-26Code
8ScaleKD (T:Swin-L S:ResNet-50)82.55NoScaleKD: Strong Vision Transformers Could Be Exc...2024-11-11Code
9DiffKD (T:Swin-L S: Swin-T)82.5NoKnowledge Diffusion for Distillation2023-05-25Code
10DIST (T: Swin-L S: Swin-T)82.3YesKnowledge Distillation from A Stronger Teacher2022-05-21Code
11SpectralKD (T:Cait-S24 S:DeiT-S)82.2NoSpectralKD: A Unified Framework for Interpreting...2024-12-26Code
12SRD (T:RegNety 160 S:DeiT-S)82.1NoUnderstanding the Role of the Projector in Knowl...2023-03-20Code
13OFA (T: ViT-B S: ResNet-50)81.33NoOne-for-All: Bridge the Gap Between Heterogeneou...2023-10-30Code
14DiffKD (T:Swin-L S: ResNet-50)80.5NoKnowledge Diffusion for Distillation2023-05-25Code
15VkD (T:RegNety 160 S:DeiT-Ti)79.2No$V_kD:$ Improving Knowledge Distillation using O...2024-03-10Code
16KD++(T:resnet-152 S:resnet-101)79.15NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
17ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)78.79NoEnsemble Knowledge Distillation for Learning Imp...2019-09-17Code
18ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)78.07NoEnsemble Knowledge Distillation for Learning Imp...2019-09-17Code
19KD++(T:resnet-152 S:resnet-50)77.48NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
20SpectralKD (T:Cait-S24 S:DeiT-T)77.4NoSpectralKD: A Unified Framework for Interpreting...2024-12-26Code
21SRD (T:RegNety 160 S:DeIT-Ti)77.2NoUnderstanding the Role of the Projector in Knowl...2023-03-20Code
22ADLIK-MO(T: ResNet101 S: ResNet50)77.14NoDistilling the Knowledge in a Neural Network2015-03-09Code
23WTTM (T: DeiT III-Small S:DeiT-Tiny)77.03NoKnowledge Distillation Based on Transformed Teac...2024-02-17Code
24ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)76.376NoEnsemble Knowledge Distillation for Learning Imp...2019-09-17Code
25KD++(T:resnet152 S:resnet34)75.53NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
26WTTM (T:resnet50, S:mobilenet-v1)73.09NoKnowledge Distillation Based on Transformed Teac...2024-02-17Code
27ReviewKD++(T:resnet50, S:mobilenet-v1)72.96NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
28KD++(T:resnet-152 S:resnet18)72.54NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
29KD++(T:renset101 S:resnet18)72.54NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
30KD++(T:resnet50 S:resnet18)72.53NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
31HSAKD (T: ResNet-34 S:ResNet-18)72.39NoHierarchical Self-supervised Augmented Knowledge...2021-07-29Code
32ICKD (T: ResNet-34 S:ResNet-18)72.19No--Code
33WTTM (T: ResNet-34 S:ResNet-18)72.19NoKnowledge Distillation Based on Transformed Teac...2024-02-17Code
34DIST (T: ResNet-34 S:ResNet-18)72.07NoKnowledge Distillation from A Stronger Teacher2022-05-21Code
35KD++(T: ResNet-34 S:ResNet-18)72.07NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
36WSL (T: ResNet-34 S:ResNet-18)72.04NoRethinking Soft Labels for Knowledge Distillatio...2021-02-01Code
37CRCD (T: ResNet-34 S:ResNet-18)71.96NoComplementary Relation Contrastive Distillation2021-03-29Code
38SRD (T: ResNet-34 S:ResNet-18)71.87NoUnderstanding the Role of the Projector in Knowl...2023-03-20Code
39KD++(T:ViT-B, S:resnet18)71.84NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
40LSHFM (T: ResNet-34 S:ResNet-18)71.72NoDistilling Knowledge by Mimicking Features2020-11-03Code
41ITRD (T: ResNet-34 S:ResNet-18)71.68NoInformation Theoretic Representation Distillation2021-12-01Code
42GLD (T: ResNet-34 S:ResNet-18)71.63No--Code
43SSKD (T: ResNet-34 S:ResNet-18)71.62NoKnowledge Distillation Meets Self-Supervision2020-06-12Code
44Knowledge Review (T: ResNet-34 S:ResNet-18)71.61NoDistilling Knowledge via Knowledge Review2021-04-19Code
45Adaptive (T:ResNet-50 S:ResNet-18)71.61NoAdaptive Distillation: Aggregating Knowledge fro...2021-10-19Code
46KD++(T: ViT-S, S:resnet18)71.46NoImproving Knowledge Distillation via Regularizin...2023-05-26Code
47AFD (T: ResNet-34 S:ResNet-18)71.38NoShow, Attend and Distill:Knowledge Distillation ...2021-02-05Code
48CRD (T: ResNet-34 S:ResNet-18)71.38NoContrastive Representation Distillation2019-10-23Code
49Overhual (T: ResNet-34 S:ResNet-18)70.81NoA Comprehensive Overhaul of Feature Distillation2019-04-03Code
50KD (T: ResNet-34 S:ResNet-18)70.66NoDistilling the Knowledge in a Neural Network2015-03-09Code