Metric: BLEU4 (higher is better)
| # | Model↕ | BLEU4▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VideoCoCa | 14.7 | Yes | VideoCoCa: Video-Text Modeling with Zero-Shot Tr... | 2022-12-09 | - |
| 2 | VLTinT (ae-test split) C3D/Ling | 14.5 | No | VLTinT: Visual-Linguistic Transformer-in-Transfo... | 2022-11-28 | Code |
| 3 | VLCap (ae-test split) - Appearance + Language | 13.38 | No | VLCap: Vision-Language with Contrastive Learning... | 2022-06-26 | Code |
| 4 | COOT (ae-test split) - Only Appearance features | 10.85 | No | COOT: Cooperative Hierarchical Transformer for V... | 2020-11-01 | Code |
| 5 | MART (ae-test split) - Appearance + Flow | 10.33 | No | MART: Memory-Augmented Recurrent Transformer for... | 2020-05-11 | Code |
| 6 | CM² | 2.38 | No | Do You Remember? Dense Video Captioning with Cro... | 2024-04-11 | Code |