Metric: Top-1 accuracy % (higher is better)
| # | Model↕ | Top-1 accuracy %▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ScaleKD (T:BEiT-L S:ViT-B/14) | 86.43 | No | ScaleKD: Strong Vision Transformers Could Be Exc... | 2024-11-11 | Code |
| 2 | ScaleKD (T:Swin-L S:ViT-B/16) | 85.53 | No | ScaleKD: Strong Vision Transformers Could Be Exc... | 2024-11-11 | Code |
| 3 | ScaleKD (T:Swin-L S:ViT-S/16) | 83.93 | No | ScaleKD: Strong Vision Transformers Could Be Exc... | 2024-11-11 | Code |
| 4 | ScaleKD (T:Swin-L S:Swin-T) | 83.8 | No | ScaleKD: Strong Vision Transformers Could Be Exc... | 2024-11-11 | Code |
| 5 | KD++(T: regnety-16GF S:ViT-B) | 83.6 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 6 | VkD (T:RegNety 160 S:DeiT-S) | 82.9 | No | $V_kD:$ Improving Knowledge Distillation using O... | 2024-03-10 | Code |
| 7 | SpectralKD (T:Swin-S S:Swin-T) | 82.7 | No | SpectralKD: A Unified Framework for Interpreting... | 2024-12-26 | Code |
| 8 | ScaleKD (T:Swin-L S:ResNet-50) | 82.55 | No | ScaleKD: Strong Vision Transformers Could Be Exc... | 2024-11-11 | Code |
| 9 | DiffKD (T:Swin-L S: Swin-T) | 82.5 | No | Knowledge Diffusion for Distillation | 2023-05-25 | Code |
| 10 | DIST (T: Swin-L S: Swin-T) | 82.3 | Yes | Knowledge Distillation from A Stronger Teacher | 2022-05-21 | Code |
| 11 | SpectralKD (T:Cait-S24 S:DeiT-S) | 82.2 | No | SpectralKD: A Unified Framework for Interpreting... | 2024-12-26 | Code |
| 12 | SRD (T:RegNety 160 S:DeiT-S) | 82.1 | No | Understanding the Role of the Projector in Knowl... | 2023-03-20 | Code |
| 13 | OFA (T: ViT-B S: ResNet-50) | 81.33 | No | One-for-All: Bridge the Gap Between Heterogeneou... | 2023-10-30 | Code |
| 14 | DiffKD (T:Swin-L S: ResNet-50) | 80.5 | No | Knowledge Diffusion for Distillation | 2023-05-25 | Code |
| 15 | VkD (T:RegNety 160 S:DeiT-Ti) | 79.2 | No | $V_kD:$ Improving Knowledge Distillation using O... | 2024-03-10 | Code |
| 16 | KD++(T:resnet-152 S:resnet-101) | 79.15 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 17 | ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%) | 78.79 | No | Ensemble Knowledge Distillation for Learning Imp... | 2019-09-17 | Code |
| 18 | ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5) | 78.07 | No | Ensemble Knowledge Distillation for Learning Imp... | 2019-09-17 | Code |
| 19 | KD++(T:resnet-152 S:resnet-50) | 77.48 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 20 | SpectralKD (T:Cait-S24 S:DeiT-T) | 77.4 | No | SpectralKD: A Unified Framework for Interpreting... | 2024-12-26 | Code |
| 21 | SRD (T:RegNety 160 S:DeIT-Ti) | 77.2 | No | Understanding the Role of the Projector in Knowl... | 2023-03-20 | Code |
| 22 | ADLIK-MO(T: ResNet101 S: ResNet50) | 77.14 | No | Distilling the Knowledge in a Neural Network | 2015-03-09 | Code |
| 23 | WTTM (T: DeiT III-Small S:DeiT-Tiny) | 77.03 | No | Knowledge Distillation Based on Transformed Teac... | 2024-02-17 | Code |
| 24 | ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half) | 76.376 | No | Ensemble Knowledge Distillation for Learning Imp... | 2019-09-17 | Code |
| 25 | KD++(T:resnet152 S:resnet34) | 75.53 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 26 | WTTM (T:resnet50, S:mobilenet-v1) | 73.09 | No | Knowledge Distillation Based on Transformed Teac... | 2024-02-17 | Code |
| 27 | ReviewKD++(T:resnet50, S:mobilenet-v1) | 72.96 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 28 | KD++(T:resnet-152 S:resnet18) | 72.54 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 29 | KD++(T:renset101 S:resnet18) | 72.54 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 30 | KD++(T:resnet50 S:resnet18) | 72.53 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 31 | HSAKD (T: ResNet-34 S:ResNet-18) | 72.39 | No | Hierarchical Self-supervised Augmented Knowledge... | 2021-07-29 | Code |
| 32 | ICKD (T: ResNet-34 S:ResNet-18) | 72.19 | No | - | - | Code |
| 33 | WTTM (T: ResNet-34 S:ResNet-18) | 72.19 | No | Knowledge Distillation Based on Transformed Teac... | 2024-02-17 | Code |
| 34 | DIST (T: ResNet-34 S:ResNet-18) | 72.07 | No | Knowledge Distillation from A Stronger Teacher | 2022-05-21 | Code |
| 35 | KD++(T: ResNet-34 S:ResNet-18) | 72.07 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 36 | WSL (T: ResNet-34 S:ResNet-18) | 72.04 | No | Rethinking Soft Labels for Knowledge Distillatio... | 2021-02-01 | Code |
| 37 | CRCD (T: ResNet-34 S:ResNet-18) | 71.96 | No | Complementary Relation Contrastive Distillation | 2021-03-29 | Code |
| 38 | SRD (T: ResNet-34 S:ResNet-18) | 71.87 | No | Understanding the Role of the Projector in Knowl... | 2023-03-20 | Code |
| 39 | KD++(T:ViT-B, S:resnet18) | 71.84 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 40 | LSHFM (T: ResNet-34 S:ResNet-18) | 71.72 | No | Distilling Knowledge by Mimicking Features | 2020-11-03 | Code |
| 41 | ITRD (T: ResNet-34 S:ResNet-18) | 71.68 | No | Information Theoretic Representation Distillation | 2021-12-01 | Code |
| 42 | GLD (T: ResNet-34 S:ResNet-18) | 71.63 | No | - | - | Code |
| 43 | SSKD (T: ResNet-34 S:ResNet-18) | 71.62 | No | Knowledge Distillation Meets Self-Supervision | 2020-06-12 | Code |
| 44 | Knowledge Review (T: ResNet-34 S:ResNet-18) | 71.61 | No | Distilling Knowledge via Knowledge Review | 2021-04-19 | Code |
| 45 | Adaptive (T:ResNet-50 S:ResNet-18) | 71.61 | No | Adaptive Distillation: Aggregating Knowledge fro... | 2021-10-19 | Code |
| 46 | KD++(T: ViT-S, S:resnet18) | 71.46 | No | Improving Knowledge Distillation via Regularizin... | 2023-05-26 | Code |
| 47 | AFD (T: ResNet-34 S:ResNet-18) | 71.38 | No | Show, Attend and Distill:Knowledge Distillation ... | 2021-02-05 | Code |
| 48 | CRD (T: ResNet-34 S:ResNet-18) | 71.38 | No | Contrastive Representation Distillation | 2019-10-23 | Code |
| 49 | Overhual (T: ResNet-34 S:ResNet-18) | 70.81 | No | A Comprehensive Overhaul of Feature Distillation | 2019-04-03 | Code |
| 50 | KD (T: ResNet-34 S:ResNet-18) | 70.66 | No | Distilling the Knowledge in a Neural Network | 2015-03-09 | Code |