Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Classification
/
ImageNet V2
Image Classification on ImageNet V2
Metric: Top 1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top 1 Accuracy (best first)
Top 1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top 1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Model soups (BASIC-L)
84.63
Yes
Model soups: averaging weights of multiple fine-...
2022-03-10
Code
2
ViT-e
84.3
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
3
Model soups (ViT-G/14)
84.22
Yes
Model soups: averaging weights of multiple fine-...
2022-03-10
Code
4
MAWS (ViT-6.5B)
84
Yes
The effectiveness of MAE pre-pretraining for bil...
2023-03-23
Code
5
ViT-G/14
83.33
Yes
Scaling Vision Transformers
2021-06-08
Code
6
MAWS (ViT-2B)
83
Yes
The effectiveness of MAE pre-pretraining for bil...
2023-03-23
Code
7
MOAT-4 (IN-22K pretraining)
81.5
No
MOAT: Alternating Mobile Convolution and Attenti...
2022-10-04
Code
8
SWAG (ViT H/14)
81.1
Yes
Revisiting Weakly Supervised Pre-Training of Vis...
2022-01-20
Code
9
MOAT-3 (IN-22K pretraining)
80.6
No
MOAT: Alternating Mobile Convolution and Attenti...
2022-10-04
Code
10
MOAT-2 (IN-22K pretraining)
79.3
No
MOAT: Alternating Mobile Convolution and Attenti...
2022-10-04
Code
11
MOAT-1 (IN-22K pretraining)
78.4
No
MOAT: Alternating Mobile Convolution and Attenti...
2022-10-04
Code
12
SwinV2-B
78.08
No
Swin Transformer V2: Scaling Up Capacity and Res...
2021-11-18
Code
13
VOLO-D5
78
No
VOLO: Vision Outlooker for Visual Recognition
2021-06-24
Code
14
VOLO-D4
77.8
No
VOLO: Vision Outlooker for Visual Recognition
2021-06-24
Code
15
CAIT-M36-448
76.7
No
Going deeper with Image Transformers
2021-03-31
Code
16
SEER (RegNet10B)
76.2
Yes
Vision Models Are More Robust And Fair When Pret...
2022-02-16
Code
17
ResMLP-B24/8 22k
74.2
Yes
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
18
ViT-B-36x1
73.9
No
Three things everyone should know about Vision T...
2022-03-18
Code
19
ResMLP-B24/8
73.4
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
20
Sequencer2D-L
73.4
No
Sequencer: Deep LSTM for Image Classification
2022-05-04
Code
21
Discrete Adversarial Distillation (ViT-B, 224)
71.7
No
Distilling Out-of-Distribution Robustness from V...
2023-11-02
Code
22
LeViT-384
71.4
No
LeViT: a Vision Transformer in ConvNet's Clothin...
2021-04-02
Code
23
LeViT-256
69.9
No
LeViT: a Vision Transformer in ConvNet's Clothin...
2021-04-02
Code
24
ResMLP-S24/16
69.8
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
25
ResNet-152x2-SAM
69.6
Yes
When Vision Transformers Outperform ResNets with...
2021-06-03
Code
26
LeViT-192
68.7
No
LeViT: a Vision Transformer in ConvNet's Clothin...
2021-04-02
Code
27
ResNet50 (A1)
68.7
No
ResNet strikes back: An improved training proced...
2021-10-01
Code
28
LeViT-128
67.5
No
LeViT: a Vision Transformer in ConvNet's Clothin...
2021-04-02
Code
29
ViT-B/16-SAM
67.5
Yes
When Vision Transformers Outperform ResNets with...
2021-06-03
Code
30
ResMLP-S12/16
66
No
ResMLP: Feedforward networks for image classific...
2021-05-07
Code
31
Mixer-B/8-SAM
65.5
Yes
When Vision Transformers Outperform ResNets with...
2021-06-03
Code
32
LeViT-128S
63.9
No
LeViT: a Vision Transformer in ConvNet's Clothin...
2021-04-02
Code
#1
Model soups (BASIC-L)
SOTA
84.63
Top 1 Accuracy
· Extra Data
· 2022-03-10
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Code
#2
ViT-e
84.3
Top 1 Accuracy
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#3
Model soups (ViT-G/14)
84.22
Top 1 Accuracy
· Extra Data
· 2022-03-10
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Code
#4
MAWS (ViT-6.5B)
84
Top 1 Accuracy
· Extra Data
· 2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Code
#5
ViT-G/14
SOTA
83.33
Top 1 Accuracy
· Extra Data
· 2021-06-08
Scaling Vision Transformers
Code
#6
MAWS (ViT-2B)
83
Top 1 Accuracy
· Extra Data
· 2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Code
#7
MOAT-4 (IN-22K pretraining)
81.5
Top 1 Accuracy
· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Code
#8
SWAG (ViT H/14)
81.1
Top 1 Accuracy
· Extra Data
· 2022-01-20
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Code
#9
MOAT-3 (IN-22K pretraining)
80.6
Top 1 Accuracy
· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Code
#10
MOAT-2 (IN-22K pretraining)
79.3
Top 1 Accuracy
· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Code
#11
MOAT-1 (IN-22K pretraining)
78.4
Top 1 Accuracy
· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Code
#12
SwinV2-B
78.08
Top 1 Accuracy
· 2021-11-18
Swin Transformer V2: Scaling Up Capacity and Resolution
Code
#13
VOLO-D5
78
Top 1 Accuracy
· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition
Code
#14
VOLO-D4
77.8
Top 1 Accuracy
· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition
Code
#15
CAIT-M36-448
SOTA
76.7
Top 1 Accuracy
· 2021-03-31
Going deeper with Image Transformers
Code
#16
SEER (RegNet10B)
76.2
Top 1 Accuracy
· Extra Data
· 2022-02-16
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Code
#17
ResMLP-B24/8 22k
74.2
Top 1 Accuracy
· Extra Data
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#18
ViT-B-36x1
73.9
Top 1 Accuracy
· 2022-03-18
Three things everyone should know about Vision Transformers
Code
#19
ResMLP-B24/8
73.4
Top 1 Accuracy
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#20
Sequencer2D-L
73.4
Top 1 Accuracy
· 2022-05-04
Sequencer: Deep LSTM for Image Classification
Code
#21
Discrete Adversarial Distillation (ViT-B, 224)
71.7
Top 1 Accuracy
· 2023-11-02
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Code
#22
LeViT-384
71.4
Top 1 Accuracy
· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Code
#23
LeViT-256
69.9
Top 1 Accuracy
· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Code
#24
ResMLP-S24/16
69.8
Top 1 Accuracy
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#25
ResNet-152x2-SAM
69.6
Top 1 Accuracy
· Extra Data
· 2021-06-03
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Code
#26
LeViT-192
68.7
Top 1 Accuracy
· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Code
#27
ResNet50 (A1)
68.7
Top 1 Accuracy
· 2021-10-01
ResNet strikes back: An improved training procedure in timm
Code
#28
LeViT-128
67.5
Top 1 Accuracy
· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Code
#29
ViT-B/16-SAM
67.5
Top 1 Accuracy
· Extra Data
· 2021-06-03
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Code
#30
ResMLP-S12/16
66
Top 1 Accuracy
· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training
Code
#31
Mixer-B/8-SAM
65.5
Top 1 Accuracy
· Extra Data
· 2021-06-03
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Code
#32
LeViT-128S
63.9
Top 1 Accuracy
· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Code