Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OFA | 91 | No | OFA: Unifying Architectures, Tasks, and Modaliti... | 2022-02-07 | Code |
| 2 | Prompt Tuning | 90.04 | No | Prompt Tuning for Generative Multimodal Pretrain... | 2022-08-04 | Code |
| 3 | CoCa | 87 | No | CoCa: Contrastive Captioners are Image-Text Foun... | 2022-05-04 | Code |
| 4 | SimVLM | 86.21 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 5 | SOHO | 85 | No | Seeing Out of tHe bOx: End-to-End Pre-training f... | 2021-04-07 | Code |
| 6 | CLIP-ViL | 80.2 | Yes | How Much Can CLIP Benefit Vision-and-Language Ta... | 2021-07-13 | Code |
| 7 | VILLA-LARGE | 80.18 | Yes | Large-Scale Adversarial Training for Vision-and-... | 2020-06-11 | Code |
| 8 | UNITER | 78.98 | No | UNITER: UNiversal Image-TExt Representation Lear... | 2019-09-25 | Code |
| 9 | EVE-ROI* | 70.81 | No | Visual Entailment: A Novel Task for Fine-Grained... | 2019-01-20 | Code |