Metric: BLEU-1 (higher is better)
| # | Model↕ | BLEU-1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GRIT (No VL pretraining - base) | 84.2 | No | GRIT: Faster and Better Image captioning Transfo... | 2022-07-20 | Code |
| 2 | ExpansionNet v2 (No VL pretraining) | 83.5 | No | Exploiting Multiple Sequence Lengths in Fast End... | 2022-08-13 | Code |
| 3 | Xmodal-Ctx | 83.4 | No | Beyond a Pre-Trained Object Detector: Cross-Moda... | 2022-05-09 | Code |
| 4 | Xmodal-Ctx | 81.5 | No | Beyond a Pre-Trained Object Detector: Cross-Moda... | 2022-05-09 | Code |
| 5 | X-Transformer | 80.9 | No | X-Linear Attention Networks for Image Captioning | 2020-03-31 | Code |
| 6 | Meshed-Memory Transformer | 80.8 | No | Meshed-Memory Transformer for Image Captioning | 2019-12-17 | Code |
| 7 | Transformer_NSC | 80.7 | No | A Better Variant of Self-Critical Sequence Train... | 2020-03-22 | Code |
| 8 | RefineCap (w/ REINFORCE) | 80.2 | No | RefineCap: Concept-Aware Refinement for Image Ca... | 2021-09-08 | - |
| 9 | RDN | 80.2 | No | Reflective Decoding Network for Image Captioning | 2019-08-30 | - |