Metric: B3 (higher is better)
| # | Model↕ | B3▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GIT, Single Model | 60.53 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 2 | GIT2, Single Model | 59.94 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 3 | PaLI | 59.38 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |
| 4 | CoCa - Google Brain | 58.01 | No | - | - | - |
| 5 | Microsoft Cognitive Services team | 55.94 | No | VIVO: Visual Vocabulary Pre-Training for Novel O... | 2020-09-28 | - |
| 6 | Single Model | 52.96 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 7 | FudanFVL | 52.56 | No | - | - | - |
| 8 | IEDA-LAB | 51.89 | No | - | - | - |
| 9 | vll@mk514 | 51.26 | No | - | - | - |
| 10 | MD | 51.16 | No | - | - | - |
| 11 | FudanWYZ | 50.75 | No | - | - | - |
| 12 | firethehole | 50.5 | No | - | - | - |
| 13 | ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 49.73 | No | - | - | - |
| 14 | VinVL (Microsoft Cognitive Services + MSR) | 49.68 | No | VinVL: Revisiting Visual Representations in Visi... | 2021-01-02 | Code |
| 15 | camel XE | 46.46 | No | - | - | - |
| 16 | RCAL | 45.33 | No | - | - | - |
| 17 | icgp2ssi1_coco_si_0.02_5_test | 44.65 | No | - | - | - |
| 18 | evertyhing | 43.92 | No | - | - | - |
| 19 | cxy_nocaps_training | 43.43 | No | - | - | - |
| 20 | 作者给的test文件 | 43.43 | No | - | - | - |
| 21 | Xinyi | 43.22 | No | - | - | - |
| 22 | Oscar | 42.86 | No | - | - | - |
| 23 | MQ-UpDown-C | 42.35 | No | - | - | - |
| 24 | UpDown | 41.5 | No | - | - | - |
| 25 | nocaps_training | 41.5 | No | - | - | - |
| 26 | B2 | 40.54 | No | - | - | - |
| 27 | UpDown + ELMo + CBS | 39.86 | No | - | - | - |
| 28 | YX | 39.28 | No | - | - | - |
| 29 | area_attention | 38.44 | No | - | - | - |
| 30 | 7_10-7_40000_predict_test.json | 37.85 | No | - | - | - |
| 31 | Human | 37.78 | No | - | - | - |
| 32 | None | 36.12 | No | - | - | - |
| 33 | Neural Baby Talk | 35.58 | No | - | - | - |
| 34 | coco_all_19 | 34.13 | No | - | - | - |
| 35 | Neural Baby Talk + CBS | 33.73 | No | - | - | - |
| 36 | Yu-Wu | 31.92 | No | - | - | - |
| 37 | CS395T | 29.57 | No | - | - | - |