| 1 | CoCa | 82.7 | Yes | CoCa: Contrastive Captioners are Image-Text Foun... | 2022-05-04 | Code |
| 2 | LiT | 82.5 | Yes | LiT: Zero-Shot Transfer with Locked-image text T... | 2021-11-15 | Code |
| 3 | BASIC | 82.3 | Yes | Combined Scaling for Zero-shot Transfer Learning | 2021-11-19 | - |
| 4 | EVA-02-CLIP-E/14+ | 79.6 | Yes | EVA-CLIP: Improved Training Techniques for CLIP ... | 2023-03-27 | Code |
| 5 | Baseline (ViT-G/14) | 79.03 | Yes | Model soups: averaging weights of multiple fine-... | 2022-03-10 | Code |
| 6 | Model soups (ViT-G/14) | 78.52 | Yes | Model soups: averaging weights of multiple fine-... | 2022-03-10 | Code |
| 7 | MAWS (ViT-6.5B) | 77.9 | Yes | The effectiveness of MAE pre-pretraining for bil... | 2023-03-23 | Code |
| 8 | MAWS (ViT-2B) | 75.8 | Yes | The effectiveness of MAE pre-pretraining for bil... | 2023-03-23 | Code |
| 9 | MAWS (ViT-H) | 72.6 | Yes | The effectiveness of MAE pre-pretraining for bil... | 2023-03-23 | Code |
| 10 | CLIP | 72.3 | Yes | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |
| 11 | ALIGN | 72.2 | Yes | Combined Scaling for Zero-shot Transfer Learning | 2021-11-19 | - |
| 12 | WiSE-FT | 72.1 | Yes | Robust fine-tuning of zero-shot models | 2021-09-04 | Code |
| 13 | ViT-e | 72 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |
| 14 | ViT-G/14 | 70.53 | Yes | Scaling Vision Transformers | 2021-06-08 | Code |
| 15 | SWAG (ViT H/14) | 69.5 | Yes | Revisiting Weakly Supervised Pre-Training of Vis... | 2022-01-20 | Code |
| 16 | NS (Eff.-L2) | 68.5 | Yes | Scaling Vision Transformers | 2021-06-08 | Code |
| 17 | RegNetY 128GF (Platt) | 64.3 | Yes | Revisiting Weakly Supervised Pre-Training of Vis... | 2022-01-20 | Code |
| 18 | LLE (ViT-H/14, MAE, Edge Aug) | 60.78 | No | A Whac-A-Mole Dilemma: Shortcuts Come in Multipl... | 2022-12-09 | Code |
| 19 | SEER (RegNet10B) | 60.2 | Yes | Vision Models Are More Robust And Fair When Pret... | 2022-02-16 | Code |
| 20 | ViT H/14 (Platt) | 60 | Yes | Revisiting Weakly Supervised Pre-Training of Vis... | 2022-01-20 | Code |
| 21 | BiT-L (ResNet-152x4) | 58.7 | Yes | Big Transfer (BiT): General Visual Representatio... | 2019-12-24 | Code |
| 22 | ViT L/16 (Platt) | 57.3 | Yes | Revisiting Weakly Supervised Pre-Training of Vis... | 2022-01-20 | Code |
| 23 | Vit B/16 (Bamboo) | 53.9 | Yes | Bamboo: Building Mega-Scale Vision Dataset Conti... | 2022-03-15 | Code |
| 24 | AR-L (Opt Relevance) | 52 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 25 | ALIGN-MRL | 51.6 | Yes | Matryoshka Representation Learning | 2022-05-26 | Code |
| 26 | ViT-B/16 (ANN-1.3B) | 50.7 | Yes | Billion-Scale Pretraining with Vision Transforme... | 2021-08-12 | - |
| 27 | ViT-B/16 (512x512) + Pyramid | 49.39 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 28 | ResNet-101 (JFT-300M) | 49.1 | Yes | Billion-Scale Pretraining with Vision Transforme... | 2021-08-12 | - |
| 29 | ViT B/16 | 48.9 | Yes | Revisiting Weakly Supervised Pre-Training of Vis... | 2022-01-20 | Code |
| 30 | ViT-B/32 | 48.4 | Yes | Billion-Scale Pretraining with Vision Transforme... | 2021-08-12 | - |
| 31 | ViT-B/16 (512x512) + Pixel | 47.53 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 32 | AR-B (Opt Relevance) | 47.1 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 33 | BiT-M (ResNet-152x4) | 47 | Yes | Big Transfer (BiT): General Visual Representatio... | 2019-12-24 | Code |
| 34 | ViT-B/16 (512x512) | 46.68 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 35 | ViT-B (Discrete 512x512) | 46.62 | Yes | Discrete Representations Strengthen Vision Trans... | 2021-11-20 | Code |
| 36 | AR-L | 46.5 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 37 | ViT-L (Opt Relevance) | 43.2 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 38 | CLIP L | 42.8 | Yes | Optimal Representations for Covariate Shift | 2021-12-31 | Code |
| 39 | ResNet-50 (JFT-300M) | 42.5 | Yes | Billion-Scale Pretraining with Vision Transforme... | 2021-08-12 | - |
| 40 | ViT-B (Opt Relevance) | 42.2 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 41 | CLIP L (LAION) | 42.1 | Yes | Optimal Representations for Covariate Shift | 2021-12-31 | Code |
| 42 | AR-B | 41.4 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 43 | RegViT on 384x384 + Adv Pyramid | 39.79 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 44 | ResNet-152 + GenInt with Transfer | 39.38 | Yes | Generative Interventions for Causal Learning | 2020-12-22 | Code |
| 45 | AR-S (Opt Relevance) | 39.3 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 46 | ResNet-50 (Bamboo) | 38.8 | Yes | Bamboo: Building Mega-Scale Vision Dataset Conti... | 2022-03-15 | Code |
| 47 | RegViT on 384x384 + Adv Pixel | 37.41 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 48 | ViT-L | 37.4 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 49 | DeiT-L (Opt Relevance) | 36.3 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 50 | BiT-S (ResNet-152x4) | 36 | Yes | Big Transfer (BiT): General Visual Representatio... | 2019-12-24 | Code |
| 51 | NASNet-A | 35.77 | Yes | - | - | - |
| 52 | PNASNet-5L | 35.63 | Yes | - | - | - |
| 53 | RegViT on 384x384 | 35.59 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 54 | ViT-B | 35.1 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 55 | RegViT on 384x384 + Random Pyramid | 34.83 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 56 | AR-S | 34.3 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 57 | RegViT on 384x384 + Random Pixel | 34.12 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 58 | RegViT (RandAug) + Adv Pyramid | 32.92 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 59 | Inception-v4 | 32.24 | Yes | - | - | - |
| 60 | DeiT-S (Opt Relevance) | 31.6 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 61 | ResNet-50 + CGC | 31.53 | Yes | Context-Gated Convolution | 2019-10-12 | Code |
| 62 | DeiT-L | 31.4 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 63 | Discrete ViT + Pixel | 30.98 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 64 | Discrete ViT + Pyramid | 30.28 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 65 | RegViT (RandAug) + Adv Pixel | 30.11 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 66 | Discrete ViT | 29.95 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 67 | ResNet-152 | 29.59 | Yes | - | - | - |
| 68 | RegViT (RandAug) + Random Pyramid | 29.41 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 69 | RegViT (RandAug) | 29.3 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 70 | ResNet-50 + GroupNorm | 29.2 | Yes | Improving robustness against common corruptions ... | 2020-06-30 | Code |
| 71 | ResNet-50 + RoHL | 29.2 | Yes | Improving robustness against common corruptions ... | 2020-06-30 | Code |
| 72 | RegViT (RandAug) + Random Pixel | 28.72 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 73 | MLP-Mixer + Pyramid | 28.6 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 74 | ResNet-50 + FixUp | 28.5 | Yes | Improving robustness against common corruptions ... | 2020-06-30 | Code |
| 75 | ResNet-50 + MixUp (rescaled) | 28.37 | Yes | On Mixup Regularization | 2020-06-10 | Code |
| 76 | DeiT-S | 28.3 | Yes | Optimizing Relevance Maps of Vision Transformers... | 2022-06-02 | Code |
| 77 | ResNet-18 + GenInt with Transfer | 27.03 | Yes | Generative Interventions for Causal Learning | 2020-12-22 | Code |
| 78 | MLP-Mixer | 25.9 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 79 | RELICv2 | 25.9 | Yes | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 80 | ViT + MixUp | 25.65 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 81 | C-BYOL | 25.5 | Yes | Compressive Visual Representations | 2021-09-27 | Code |
| 82 | MLP-Mixer + Pixel | 24.75 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 83 | BYOL (BG_RM) | 23.9 | Yes | Characterizing and Improving the Robustness of S... | 2021-03-23 | - |
| 84 | RELIC | 23.8 | Yes | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 85 | BYOL | 23 | Yes | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 86 | SwAV (BG_RM) | 21.9 | Yes | Characterizing and Improving the Robustness of S... | 2021-03-23 | - |
| 87 | ViT + CutMix | 21.61 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 88 | MoCo-v2 (BG_Swaps) | 20.8 | Yes | Characterizing and Improving the Robustness of S... | 2021-03-23 | - |
| 89 | C-SimCLR | 20.8 | Yes | Compressive Visual Representations | 2021-09-27 | Code |
| 90 | SeLa(v2) (reverse linear probing) | 20.61 | Yes | - | - | - |
| 91 | DILEMMA | 20.51 | Yes | Representation Learning by Detecting Incorrect L... | 2022-04-10 | Code |
| 92 | DeepCluster(v2) (reverse linear probing) | 19.73 | Yes | - | - | - |
| 93 | VGG-14 | 19.13 | Yes | - | - | - |
| 94 | ResNet-50 (ImageNet-Captions) | 18.7 | Yes | Data Determines Distributional Robustness in Con... | 2022-05-03 | Code |
| 95 | SwAV (reverse linear probing) | 17.71 | Yes | - | - | - |
| 96 | ViT | 17.36 | Yes | Pyramid Adversarial Training Improves ViT Perfor... | 2021-11-30 | Code |
| 97 | ResNet34-RPG | 16.5 | Yes | Compact and Optimal Deep Learning with Recurrent... | 2021-07-15 | Code |
| 98 | CLIP (CC12M pretrain) | 15.24 | Yes | Robust Cross-Modal Representation Learning with ... | 2022-04-10 | - |
| 99 | SimCLR | 14.6 | Yes | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 100 | ResNet-152 (FRCNN-ag-ad, VOC) | 13.2 | Yes | Class-agnostic Object Detection | 2020-11-28 | - |
| 101 | MoCo(v2) (reverse linear probing) | 12.67 | Yes | - | - | - |
| 102 | MoCHi (reverse linear probing) | 12.64 | Yes | - | - | - |
| 103 | OBoW (reverse linear probing) | 12.23 | Yes | - | - | - |
| 104 | AlexNet | 6.78 | Yes | - | - | - |
| 105 | BigBiGAN (RevNet-50 4×) | 4.92 | Yes | Self-Supervised Learning for Large-Scale Unsuper... | 2020-08-24 | Code |