Metric: METEOR (higher is better)
| # | Model↕ | METEOR▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaLI | 30.99 | No | PaLI: A Jointly-Scaled Multilingual Language-Ima... | 2022-09-14 | Code |
| 2 | GIT, Single Model | 30.45 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 3 | CoCa - Google Brain | 30.18 | No | - | - | - |
| 4 | GIT2, Single Model | 30.15 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 5 | Microsoft Cognitive Services team | 28.17 | No | VIVO: Visual Vocabulary Pre-Training for Novel O... | 2020-09-28 | - |
| 6 | FudanFVL | 28.13 | No | - | - | - |
| 7 | Single Model | 27.91 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 8 | FudanWYZ | 27.75 | No | - | - | - |
| 9 | firethehole | 27.39 | No | - | - | - |
| 10 | Human | 26.83 | No | - | - | - |
| 11 | IEDA-LAB | 25.55 | No | - | - | - |
| 12 | vll@mk514 | 24.5 | No | - | - | - |
| 13 | icgp2ssi1_coco_si_0.02_5_test | 24.01 | No | - | - | - |
| 14 | ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 23.88 | No | - | - | - |
| 15 | MD | 23.79 | No | - | - | - |
| 16 | evertyhing | 23.69 | No | - | - | - |
| 17 | VinVL (Microsoft Cognitive Services + MSR) | 23.55 | No | VinVL: Revisiting Visual Representations in Visi... | 2021-01-02 | Code |
| 18 | vinvl_yuan_cbs | 22.18 | No | - | - | - |
| 19 | RCAL | 22.04 | No | - | - | - |
| 20 | Oscar | 21.73 | No | - | - | - |
| 21 | UpDown-C | 21.73 | No | - | - | - |
| 22 | cxy_nocaps_training | 21.65 | No | - | - | - |
| 23 | Xinyi | 21.57 | No | - | - | - |
| 24 | camel XE | 21.55 | No | - | - | - |
| 25 | UpDown + ELMo + CBS | 20.88 | No | - | - | - |
| 26 | 7_10-7_40000_predict_test.json | 19.95 | No | - | - | - |
| 27 | Neural Baby Talk + CBS | 19.04 | No | - | - | - |
| 28 | Neural Baby Talk | 18.31 | No | - | - | - |
| 29 | nocaps_training | 18.29 | No | - | - | - |
| 30 | UpDown | 18.29 | No | - | - | - |
| 31 | Check | 17.94 | No | - | - | - |
| 32 | B2 | 17.48 | No | - | - | - |
| 33 | area_attention | 17.43 | No | - | - | - |
| 34 | YX | 17.2 | No | - | - | - |
| 35 | Yu-Wu | 16.97 | No | - | - | - |
| 36 | CS395T | 16.19 | No | - | - | - |
| 37 | coco_all_19 | 16.07 | No | - | - | - |