Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Knowledge Distillation
/
ImageNet
Knowledge Distillation on ImageNet
Metric: Top-1 accuracy % (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top-1 accuracy % (best first)
Top-1 accuracy % (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-1 accuracy %
▼
Extra Data
Paper
Date
↕
Code
1
ScaleKD (T:BEiT-L S:ViT-B/14)
86.43
No
ScaleKD: Strong Vision Transformers Could Be Exc...
2024-11-11
Code
2
ScaleKD (T:Swin-L S:ViT-B/16)
85.53
No
ScaleKD: Strong Vision Transformers Could Be Exc...
2024-11-11
Code
3
ScaleKD (T:Swin-L S:ViT-S/16)
83.93
No
ScaleKD: Strong Vision Transformers Could Be Exc...
2024-11-11
Code
4
ScaleKD (T:Swin-L S:Swin-T)
83.8
No
ScaleKD: Strong Vision Transformers Could Be Exc...
2024-11-11
Code
5
KD++(T: regnety-16GF S:ViT-B)
83.6
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
6
VkD (T:RegNety 160 S:DeiT-S)
82.9
No
$V_kD:$ Improving Knowledge Distillation using O...
2024-03-10
Code
7
SpectralKD (T:Swin-S S:Swin-T)
82.7
No
SpectralKD: A Unified Framework for Interpreting...
2024-12-26
Code
8
ScaleKD (T:Swin-L S:ResNet-50)
82.55
No
ScaleKD: Strong Vision Transformers Could Be Exc...
2024-11-11
Code
9
DiffKD (T:Swin-L S: Swin-T)
82.5
No
Knowledge Diffusion for Distillation
2023-05-25
Code
10
DIST (T: Swin-L S: Swin-T)
82.3
Yes
Knowledge Distillation from A Stronger Teacher
2022-05-21
Code
11
SpectralKD (T:Cait-S24 S:DeiT-S)
82.2
No
SpectralKD: A Unified Framework for Interpreting...
2024-12-26
Code
12
SRD (T:RegNety 160 S:DeiT-S)
82.1
No
Understanding the Role of the Projector in Knowl...
2023-03-20
Code
13
OFA (T: ViT-B S: ResNet-50)
81.33
No
One-for-All: Bridge the Gap Between Heterogeneou...
2023-10-30
Code
14
DiffKD (T:Swin-L S: ResNet-50)
80.5
No
Knowledge Diffusion for Distillation
2023-05-25
Code
15
VkD (T:RegNety 160 S:DeiT-Ti)
79.2
No
$V_kD:$ Improving Knowledge Distillation using O...
2024-03-10
Code
16
KD++(T:resnet-152 S:resnet-101)
79.15
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
17
ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)
78.79
No
Ensemble Knowledge Distillation for Learning Imp...
2019-09-17
Code
18
ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)
78.07
No
Ensemble Knowledge Distillation for Learning Imp...
2019-09-17
Code
19
KD++(T:resnet-152 S:resnet-50)
77.48
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
20
SpectralKD (T:Cait-S24 S:DeiT-T)
77.4
No
SpectralKD: A Unified Framework for Interpreting...
2024-12-26
Code
21
SRD (T:RegNety 160 S:DeIT-Ti)
77.2
No
Understanding the Role of the Projector in Knowl...
2023-03-20
Code
22
ADLIK-MO(T: ResNet101 S: ResNet50)
77.14
No
Distilling the Knowledge in a Neural Network
2015-03-09
Code
23
WTTM (T: DeiT III-Small S:DeiT-Tiny)
77.03
No
Knowledge Distillation Based on Transformed Teac...
2024-02-17
Code
24
ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)
76.376
No
Ensemble Knowledge Distillation for Learning Imp...
2019-09-17
Code
25
KD++(T:resnet152 S:resnet34)
75.53
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
26
WTTM (T:resnet50, S:mobilenet-v1)
73.09
No
Knowledge Distillation Based on Transformed Teac...
2024-02-17
Code
27
ReviewKD++(T:resnet50, S:mobilenet-v1)
72.96
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
28
KD++(T:resnet-152 S:resnet18)
72.54
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
29
KD++(T:renset101 S:resnet18)
72.54
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
30
KD++(T:resnet50 S:resnet18)
72.53
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
31
HSAKD (T: ResNet-34 S:ResNet-18)
72.39
No
Hierarchical Self-supervised Augmented Knowledge...
2021-07-29
Code
32
ICKD (T: ResNet-34 S:ResNet-18)
72.19
No
-
-
Code
33
WTTM (T: ResNet-34 S:ResNet-18)
72.19
No
Knowledge Distillation Based on Transformed Teac...
2024-02-17
Code
34
DIST (T: ResNet-34 S:ResNet-18)
72.07
No
Knowledge Distillation from A Stronger Teacher
2022-05-21
Code
35
KD++(T: ResNet-34 S:ResNet-18)
72.07
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
36
WSL (T: ResNet-34 S:ResNet-18)
72.04
No
Rethinking Soft Labels for Knowledge Distillatio...
2021-02-01
Code
37
CRCD (T: ResNet-34 S:ResNet-18)
71.96
No
Complementary Relation Contrastive Distillation
2021-03-29
Code
38
SRD (T: ResNet-34 S:ResNet-18)
71.87
No
Understanding the Role of the Projector in Knowl...
2023-03-20
Code
39
KD++(T:ViT-B, S:resnet18)
71.84
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
40
LSHFM (T: ResNet-34 S:ResNet-18)
71.72
No
Distilling Knowledge by Mimicking Features
2020-11-03
Code
41
ITRD (T: ResNet-34 S:ResNet-18)
71.68
No
Information Theoretic Representation Distillation
2021-12-01
Code
42
GLD (T: ResNet-34 S:ResNet-18)
71.63
No
-
-
Code
43
SSKD (T: ResNet-34 S:ResNet-18)
71.62
No
Knowledge Distillation Meets Self-Supervision
2020-06-12
Code
44
Knowledge Review (T: ResNet-34 S:ResNet-18)
71.61
No
Distilling Knowledge via Knowledge Review
2021-04-19
Code
45
Adaptive (T:ResNet-50 S:ResNet-18)
71.61
No
Adaptive Distillation: Aggregating Knowledge fro...
2021-10-19
Code
46
KD++(T: ViT-S, S:resnet18)
71.46
No
Improving Knowledge Distillation via Regularizin...
2023-05-26
Code
47
AFD (T: ResNet-34 S:ResNet-18)
71.38
No
Show, Attend and Distill:Knowledge Distillation ...
2021-02-05
Code
48
CRD (T: ResNet-34 S:ResNet-18)
71.38
No
Contrastive Representation Distillation
2019-10-23
Code
49
Overhual (T: ResNet-34 S:ResNet-18)
70.81
No
A Comprehensive Overhaul of Feature Distillation
2019-04-03
Code
50
KD (T: ResNet-34 S:ResNet-18)
70.66
No
Distilling the Knowledge in a Neural Network
2015-03-09
Code
#1
ScaleKD (T:BEiT-L S:ViT-B/14)
SOTA
86.43
Top-1 accuracy %
· 2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Code
#2
ScaleKD (T:Swin-L S:ViT-B/16)
85.53
Top-1 accuracy %
· 2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Code
#3
ScaleKD (T:Swin-L S:ViT-S/16)
83.93
Top-1 accuracy %
· 2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Code
#4
ScaleKD (T:Swin-L S:Swin-T)
83.8
Top-1 accuracy %
· 2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Code
#5
KD++(T: regnety-16GF S:ViT-B)
SOTA
83.6
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#6
VkD (T:RegNety 160 S:DeiT-S)
82.9
Top-1 accuracy %
· 2024-03-10
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
Code
#7
SpectralKD (T:Swin-S S:Swin-T)
82.7
Top-1 accuracy %
· 2024-12-26
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis
Code
#8
ScaleKD (T:Swin-L S:ResNet-50)
82.55
Top-1 accuracy %
· 2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Code
#9
DiffKD (T:Swin-L S: Swin-T)
SOTA
82.5
Top-1 accuracy %
· 2023-05-25
Knowledge Diffusion for Distillation
Code
#10
DIST (T: Swin-L S: Swin-T)
SOTA
82.3
Top-1 accuracy %
· Extra Data
· 2022-05-21
Knowledge Distillation from A Stronger Teacher
Code
#11
SpectralKD (T:Cait-S24 S:DeiT-S)
82.2
Top-1 accuracy %
· 2024-12-26
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis
Code
#12
SRD (T:RegNety 160 S:DeiT-S)
82.1
Top-1 accuracy %
· 2023-03-20
Understanding the Role of the Projector in Knowledge Distillation
Code
#13
OFA (T: ViT-B S: ResNet-50)
81.33
Top-1 accuracy %
· 2023-10-30
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Code
#14
DiffKD (T:Swin-L S: ResNet-50)
80.5
Top-1 accuracy %
· 2023-05-25
Knowledge Diffusion for Distillation
Code
#15
VkD (T:RegNety 160 S:DeiT-Ti)
79.2
Top-1 accuracy %
· 2024-03-10
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
Code
#16
KD++(T:resnet-152 S:resnet-101)
79.15
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#17
ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)
SOTA
78.79
Top-1 accuracy %
· 2019-09-17
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
Code
#18
ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)
78.07
Top-1 accuracy %
· 2019-09-17
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
Code
#19
KD++(T:resnet-152 S:resnet-50)
77.48
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#20
SpectralKD (T:Cait-S24 S:DeiT-T)
77.4
Top-1 accuracy %
· 2024-12-26
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis
Code
#21
SRD (T:RegNety 160 S:DeIT-Ti)
77.2
Top-1 accuracy %
· 2023-03-20
Understanding the Role of the Projector in Knowledge Distillation
Code
#22
ADLIK-MO(T: ResNet101 S: ResNet50)
SOTA
77.14
Top-1 accuracy %
· 2015-03-09
Distilling the Knowledge in a Neural Network
Code
#23
WTTM (T: DeiT III-Small S:DeiT-Tiny)
77.03
Top-1 accuracy %
· 2024-02-17
Knowledge Distillation Based on Transformed Teacher Matching
Code
#24
ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)
76.376
Top-1 accuracy %
· 2019-09-17
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
Code
#25
KD++(T:resnet152 S:resnet34)
75.53
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#26
WTTM (T:resnet50, S:mobilenet-v1)
73.09
Top-1 accuracy %
· 2024-02-17
Knowledge Distillation Based on Transformed Teacher Matching
Code
#27
ReviewKD++(T:resnet50, S:mobilenet-v1)
72.96
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#28
KD++(T:resnet-152 S:resnet18)
72.54
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#29
KD++(T:renset101 S:resnet18)
72.54
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#30
KD++(T:resnet50 S:resnet18)
72.53
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#31
HSAKD (T: ResNet-34 S:ResNet-18)
72.39
Top-1 accuracy %
· 2021-07-29
Hierarchical Self-supervised Augmented Knowledge Distillation
Code
#32
ICKD (T: ResNet-34 S:ResNet-18)
72.19
Top-1 accuracy %
No paper
Code
#33
WTTM (T: ResNet-34 S:ResNet-18)
72.19
Top-1 accuracy %
· 2024-02-17
Knowledge Distillation Based on Transformed Teacher Matching
Code
#34
DIST (T: ResNet-34 S:ResNet-18)
72.07
Top-1 accuracy %
· 2022-05-21
Knowledge Distillation from A Stronger Teacher
Code
#35
KD++(T: ResNet-34 S:ResNet-18)
72.07
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#36
WSL (T: ResNet-34 S:ResNet-18)
72.04
Top-1 accuracy %
· 2021-02-01
Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective
Code
#37
CRCD (T: ResNet-34 S:ResNet-18)
71.96
Top-1 accuracy %
· 2021-03-29
Complementary Relation Contrastive Distillation
Code
#38
SRD (T: ResNet-34 S:ResNet-18)
71.87
Top-1 accuracy %
· 2023-03-20
Understanding the Role of the Projector in Knowledge Distillation
Code
#39
KD++(T:ViT-B, S:resnet18)
71.84
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#40
LSHFM (T: ResNet-34 S:ResNet-18)
71.72
Top-1 accuracy %
· 2020-11-03
Distilling Knowledge by Mimicking Features
Code
#41
ITRD (T: ResNet-34 S:ResNet-18)
71.68
Top-1 accuracy %
· 2021-12-01
Information Theoretic Representation Distillation
Code
#42
GLD (T: ResNet-34 S:ResNet-18)
71.63
Top-1 accuracy %
No paper
Code
#43
SSKD (T: ResNet-34 S:ResNet-18)
71.62
Top-1 accuracy %
· 2020-06-12
Knowledge Distillation Meets Self-Supervision
Code
#44
Knowledge Review (T: ResNet-34 S:ResNet-18)
71.61
Top-1 accuracy %
· 2021-04-19
Distilling Knowledge via Knowledge Review
Code
#45
Adaptive (T:ResNet-50 S:ResNet-18)
71.61
Top-1 accuracy %
· 2021-10-19
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation
Code
#46
KD++(T: ViT-S, S:resnet18)
71.46
Top-1 accuracy %
· 2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction
Code
#47
AFD (T: ResNet-34 S:ResNet-18)
71.38
Top-1 accuracy %
· 2021-02-05
Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching
Code
#48
CRD (T: ResNet-34 S:ResNet-18)
71.38
Top-1 accuracy %
· 2019-10-23
Contrastive Representation Distillation
Code
#49
Overhual (T: ResNet-34 S:ResNet-18)
70.81
Top-1 accuracy %
· 2019-04-03
A Comprehensive Overhaul of Feature Distillation
Code
#50
KD (T: ResNet-34 S:ResNet-18)
70.66
Top-1 accuracy %
· 2015-03-09
Distilling the Knowledge in a Neural Network
Code