Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | BASIC (Lion) | 96.8 | No | - | - | - |
| 2 | CoCa | 96.5 | No | CoCa: Contrastive Captioners are Image-Text Foun... | 2022-05-04 | Code |
| 3 | LiT ViT-e | 96.1 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |
| 4 | LiT-22B | 96 | No | Scaling Vision Transformers to 22 Billion Parame... | 2023-02-10 | Code |
| 5 | BASIC | 95.7 | No | Combined Scaling for Zero-shot Transfer Learning | 2021-11-19 | - |
| 6 | EVA-CLIP-18B | 95.7 | No | EVA-CLIP-18B: Scaling CLIP to 18 Billion Paramet... | 2024-02-06 | Code |
| 7 | EVA-CLIP-E/14+ | 94.5 | No | EVA-CLIP: Improved Training Techniques for CLIP ... | 2023-03-27 | Code |
| 8 | LiT-tuning | 93.9 | No | LiT: Zero-Shot Transfer with Locked-image text T... | 2021-11-15 | Code |
| 9 | ALIGN | 92.2 | No | Scaling Up Visual and Vision-Language Representa... | 2021-02-11 | Code |
| 10 | CLIP | 88.9 | No | Learning Transferable Visual Models From Natural... | 2021-02-26 | Code |
| 11 | AltCLIP | 87.2 | No | AltCLIP: Altering the Language Encoder in CLIP f... | 2022-11-12 | Code |
| 12 | PaLI | 81.97 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |