Metric: Accuracy (Private) (higher is better)
| # | Model↕ | Accuracy (Private)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | LiT-22B | 87.6 | No | Scaling Vision Transformers to 22 Billion Parame... | 2023-02-10 | Code |
| 2 | LiT ViT-e | 84.9 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |
| 3 | CoCa | 82.7 | No | CoCa: Contrastive Captioners are Image-Text Foun... | 2022-05-04 | Code |
| 4 | EVA-CLIP-18B | 82.2 | No | EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramet... | 2024-02-06 | Code |
| 5 | LiT-tuning | 81.1 | No | LiT: Zero-Shot Transfer with Locked-image text T... | 2021-11-15 | Code |
| 6 | InternVL-C | 80.6 | No | InternVL: Scaling up Vision Foundation Models an... | 2023-12-21 | Code |
| 7 | EVA-CLIP-E/14+ | 79.6 | No | EVA-CLIP: Improved Training Techniques for CLIP ... | 2023-03-27 | Code |
| 8 | CLIP | 72.3 | No | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |
| 9 | PaLI | 42.62 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |