Metric: ROUGE-L (higher is better)
| # | Model↕ | ROUGE-L▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GIT2 | 63.19 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 2 | GIT | 63.12 | No | GIT: A Generative Image-to-text Transformer for ... | 2022-05-27 | Code |
| 3 | Microsoft Cognitive Services team | 61.2 | Yes | Scaling Up Vision-Language Pre-training for Imag... | 2021-11-24 | - |
| 4 | VLAF2 | 58.99 | No | - | - | - |
| 5 | Microsoft Cognitive Services team | 58.26 | No | VIVO: Visual Vocabulary Pre-Training for Novel O... | 2020-09-28 | - |
| 6 | icp2ssi1_coco_si_0.02_5_test | 54.59 | No | - | - | - |
| 7 | test_cbs2 | 53.39 | No | - | - | - |
| 8 | Human | 52.83 | No | - | - | - |
| 9 | UpDown + ELMo + CBS | 51.82 | No | - | - | - |
| 10 | UpDown | 50.92 | No | - | - | - |
| 11 | Neural Baby Talk | 48.87 | No | - | - | - |
| 12 | Neural Baby Talk + CBS | 48.74 | No | - | - | - |