Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OFA | 91.2 | No | OFA: Unifying Architectures, Tasks, and Modaliti... | 2022-02-07 | Code |
| 2 | Prompt Tuning | 90.12 | No | Prompt Tuning for Generative Multimodal Pretrain... | 2022-08-04 | Code |
| 3 | CoCa | 87.1 | No | CoCa: Contrastive Captioners are Image-Text Foun... | 2022-05-04 | Code |
| 4 | SimVLM | 86.32 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 5 | SOHO | 84.95 | No | Seeing Out of tHe bOx: End-to-End Pre-training f... | 2021-04-07 | Code |
| 6 | MAD (Single Model, Formerly CLIP-TD) | 80.32 | No | Multimodal Adaptive Distillation for Leveraging ... | 2022-04-22 | - |
| 7 | UNITER (Large) | 78.98 | No | UNITER: UNiversal Image-TExt Representation Lear... | 2019-09-25 | Code |
| 8 | EVE-ROI* | 70.47 | No | Visual Entailment: A Novel Task for Fine-Grained... | 2019-01-20 | Code |