Metric: BLEU-3 (higher is better)
| # | Model↕ | BLEU-3▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | UniVL + MELTR | 24.12 | No | MELTR: Meta Loss Transformer for Learning to Fin... | 2023-03-23 | Code |
| 2 | UniVL | 23.87 | Yes | UniVL: A Unified Video and Language Pre-Training... | 2020-02-15 | Code |
| 3 | COOT | 17.97 | Yes | COOT: Cooperative Hierarchical Transformer for V... | 2020-11-01 | Code |
| 4 | VLM | 17.78 | Yes | VLM: Task-agnostic Video-Language Model Pre-trai... | 2021-05-20 | Code |
| 5 | OmniVL | 12.87 | No | OmniVL:One Foundation Model for Image-Language a... | 2022-09-15 | - |
| 6 | VideoBERT + S3D | 7.59 | No | VideoBERT: A Joint Model for Video and Language ... | 2019-04-03 | Code |
| 7 | Zhou | 7.53 | No | End-to-End Dense Video Captioning with Masked Tr... | 2018-04-03 | Code |