Metric: B1 (higher is better)
| # | Model↕ | B1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GIT, Single Model | 88.1 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 2 | CoCa - Google Brain | 87.01 | No | - | - | - |
| 3 | Microsoft Cognitive Services team | 85.62 | No | Scaling Up Vision-Language Pre-training for Imag... | 2021-11-24 | - |
| 4 | Prismer | 84.87 | No | Prismer: A Vision-Language Model with Multi-Task... | 2023-03-04 | Code |
| 5 | FudanFVL | 83.9 | No | - | - | - |
| 6 | Single Model | 83.78 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 7 | IEDA-LAB | 83.25 | No | - | - | - |
| 8 | FudanWYZ | 82.95 | No | - | - | - |
| 9 | MD | 82.43 | No | - | - | - |
| 10 | vll@mk514 | 81.61 | No | - | - | - |
| 11 | VinVL (Microsoft Cognitive Services + MSR) | 81.59 | No | VinVL: Revisiting Visual Representations in Visi... | 2021-01-02 | Code |
| 12 | ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 81.03 | No | - | - | - |
| 13 | firethehole | 80.77 | No | - | - | - |
| 14 | Oscar | 79.57 | No | - | - | - |
| 15 | vinvl_yuan_cbs | 79.32 | No | - | - | - |
| 16 | icgp2ssi1_coco_si_0.02_5_test | 79 | No | - | - | - |
| 17 | evertyhing | 78.92 | No | - | - | - |
| 18 | cxy_nocaps_training | 78.75 | No | - | - | - |
| 19 | Xinyi | 78.58 | No | - | - | - |
| 20 | RCAL | 78.19 | No | - | - | - |
| 21 | camel XE | 77.97 | No | - | - | - |
| 22 | MQ-UpDown-C | 76.89 | No | - | - | - |
| 23 | Human | 76.64 | No | - | - | - |
| 24 | UpDown + ELMo + CBS | 76.59 | No | - | - | - |
| 25 | nocaps_training | 74 | No | - | - | - |
| 26 | UpDown | 74 | No | - | - | - |
| 27 | Neural Baby Talk + CBS | 73.42 | No | - | - | - |
| 28 | B2 | 73.04 | No | - | - | - |
| 29 | YX | 72.78 | No | - | - | - |
| 30 | 7_10-7_40000_predict_test.json | 72.49 | No | - | - | - |
| 31 | Neural Baby Talk | 72.33 | No | - | - | - |
| 32 | area_attention | 72.02 | No | - | - | - |
| 33 | None | 71.69 | No | - | - | - |
| 34 | coco_all_19 | 69.44 | No | - | - | - |
| 35 | CS395T | 69.07 | No | - | - | - |
| 36 | Yu-Wu | 67.85 | No | - | - | - |