Metric: CIDEr (higher is better)
| # | Model↕ | CIDEr▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Lyrics | 126.8 | No | Lyrics: Boosting Fine-grained Language-Vision Al... | 2023-12-08 | - |
| 2 | GIT, Single Model | 123.39 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 3 | CoCa - Google Brain | 120.55 | No | - | - | - |
| 4 | Microsoft Cognitive Services team | 114.25 | No | Scaling Up Vision-Language Pre-training for Imag... | 2021-11-24 | - |
| 5 | Prismer | 110.84 | No | Prismer: A Vision-Language Model with Multi-Task... | 2023-03-04 | Code |
| 6 | Single Model | 110.31 | No | SimVLM: Simple Visual Language Model Pretraining... | 2021-08-24 | Code |
| 7 | FudanFVL | 108.29 | No | - | - | - |
| 8 | FudanWYZ | 106.81 | No | - | - | - |
| 9 | IEDA-LAB | 98.08 | No | - | - | - |
| 10 | firethehole | 97.61 | No | - | - | - |
| 11 | vll@mk514 | 93.45 | No | - | - | - |
| 12 | MD | 93 | No | - | - | - |
| 13 | VinVL (Microsoft Cognitive Services + MSR) | 92.46 | No | VinVL: Revisiting Visual Representations in Visi... | 2021-01-02 | Code |
| 14 | ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 87.56 | No | - | - | - |
| 15 | icgp2ssi1_coco_si_0.02_5_test | 87.34 | No | - | - | - |
| 16 | evertyhing | 86 | No | - | - | - |
| 17 | Human | 85.34 | No | - | - | - |
| 18 | RCAL | 82.88 | No | - | - | - |
| 19 | Oscar | 80.93 | No | - | - | - |
| 20 | vinvl_yuan_cbs | 79.04 | No | - | - | - |
| 21 | cxy_nocaps_training | 78.48 | No | - | - | - |
| 22 | Xinyi | 78.23 | No | - | - | - |
| 23 | camel XE | 75.88 | No | - | - | - |
| 24 | MQ-UpDown-C | 75.58 | No | - | - | - |
| 25 | UpDown + ELMo + CBS | 73.09 | No | - | - | - |
| 26 | ClipCap (Transformer) | 65.83 | No | ClipCap: CLIP Prefix for Image Captioning | 2021-11-18 | Code |
| 27 | ClipCap (MLP + GPT2 tuning) | 65.7 | No | ClipCap: CLIP Prefix for Image Captioning | 2021-11-18 | Code |
| 28 | Neural Baby Talk + CBS | 61.48 | No | - | - | - |
| 29 | 7_10-7_40000_predict_test.json | 61.48 | No | - | - | - |
| 30 | None | 55.97 | No | - | - | - |
| 31 | nocaps_training | 54.25 | No | - | - | - |
| 32 | UpDown | 54.25 | No | - | - | - |
| 33 | Neural Baby Talk | 53.36 | No | - | - | - |
| 34 | YX | 49.02 | No | - | - | - |
| 35 | area_attention | 48.29 | No | - | - | - |
| 36 | B2 | 47.69 | No | - | - | - |
| 37 | Yu-Wu | 46.18 | No | - | - | - |
| 38 | coco_all_19 | 45.27 | No | - | - | - |
| 39 | CS395T | 39.33 | No | - | - | - |