Metric: ROUGE-L (higher is better)
| # | Model↕ | ROUGE-L▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GIT, Single Model | 63.12 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 2 | CoCa - Google Brain | 62.52 | No | - | - | - |
| 3 | Microsoft Cognitive Services team | 61.2 | No | Scaling Up Vision-Language Pre-training for Imag... | 2021-11-24 | - |
| 4 | Prismer | 60.55 | No | Prismer: A Vision-Language Model with Multi-Task... | 2023-03-04 | Code |
| 5 | Single Model | 59.86 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 6 | FudanFVL | 59.82 | No | - | - | - |
| 7 | FudanWYZ | 59.18 | No | - | - | - |
| 8 | IEDA-LAB | 58.56 | No | - | - | - |
| 9 | firethehole | 58.25 | No | - | - | - |
| 10 | MD | 57.57 | No | - | - | - |
| 11 | vll@mk514 | 57.4 | No | - | - | - |
| 12 | VinVL (Microsoft Cognitive Services + MSR) | 56.96 | No | VinVL: Revisiting Visual Representations in Visi... | 2021-01-02 | Code |
| 13 | ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 56.7 | No | - | - | - |
| 14 | icgp2ssi1_coco_si_0.02_5_test | 55.03 | No | - | - | - |
| 15 | evertyhing | 54.75 | No | - | - | - |
| 16 | camel XE | 54.3 | No | - | - | - |
| 17 | Oscar | 54.07 | No | - | - | - |
| 18 | RCAL | 53.85 | No | - | - | - |
| 19 | vinvl_yuan_cbs | 53.8 | No | - | - | - |
| 20 | Human | 52.83 | No | - | - | - |
| 21 | cxy_nocaps_training | 52.54 | No | - | - | - |
| 22 | MQ-UpDown-C | 52.53 | No | - | - | - |
| 23 | Xinyi | 52.35 | No | - | - | - |
| 24 | UpDown + ELMo + CBS | 51.82 | No | - | - | - |
| 25 | nocaps_training | 50.92 | No | - | - | - |
| 26 | UpDown | 50.92 | No | - | - | - |
| 27 | 7_10-7_40000_predict_test.json | 50.4 | No | - | - | - |
| 28 | B2 | 49.97 | No | - | - | - |
| 29 | None | 49.64 | No | - | - | - |
| 30 | YX | 49.38 | No | - | - | - |
| 31 | area_attention | 49.03 | No | - | - | - |
| 32 | Neural Baby Talk | 48.87 | No | - | - | - |
| 33 | Neural Baby Talk + CBS | 48.74 | No | - | - | - |
| 34 | coco_all_19 | 47.6 | No | - | - | - |
| 35 | Yu-Wu | 46.61 | No | - | - | - |
| 36 | CS395T | 46.58 | No | - | - | - |