Metric: METEOR (higher is better)
| # | Model↕ | METEOR▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Vid2Seq | 17 | Yes | Vid2Seq: Large-Scale Pretraining of a Visual Lan... | 2023-02-27 | Code |
| 2 | ADV-INF + Global | 16.36 | No | - | - | Code |
| 3 | Bi-directional+intra captioning | 11.28 | No | Team RUC_AIM3 Technical Report at Activitynet 20... | 2020-06-14 | - |
| 4 | GVL | 10.03 | No | Learning Grounded Vision-Language Representation... | 2023-03-11 | Code |
| 5 | TSRM-CMG-HRNN+SCST | 9.71 | No | Dense-Captioning Events in Videos: SYSU Submissi... | 2020-06-21 | Code |
| 6 | PDVC (TSP features, no SCST) | 9.03 | No | End-to-End Dense Video Captioning with Parallel ... | 2021-08-17 | Code |
| 7 | TSP | 8.75 | No | TSP: Temporally-Sensitive Pretraining of Video E... | 2020-11-23 | Code |
| 8 | CM² | 8.55 | No | Do You Remember? Dense Video Captioning with Cro... | 2024-04-11 | Code |
| 9 | BMT | 8.44 | No | A Better Use of Audio-Visual Cues: Dense Video C... | 2020-05-17 | Code |
| 10 | iPerceive (Chadha et al., 2020) | 7.87 | No | iPerceive: Applying Common-Sense Reasoning to Mu... | 2020-11-16 | - |
| 11 | MDVC | 7.31 | No | Multi-modal Dense Video Captioning | 2020-03-17 | Code |