Video Captioning on ViTT
Metric: CIDEr (higher is better)
LeaderboardDataset
Loading chart...
Results
Submit a result| # | Model↕ | CIDEr▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | HiCM² | 51.2 | Yes | HiCM$^2$: Hierarchical Compact Memory Modeling f... | 2024-12-19 | Code |
| 2 | Vid2Seq (VidChapters-7M PT) | 50.9 | Yes | - | - | - |
| 3 | Vid2Seq | 43.5 | Yes | Vid2Seq: Large-Scale Pretraining of a Visual Lan... | 2023-02-27 | Code |
| 4 | Vid2Seq (VidChapters-7M PT) | 30.2 | Yes | - | - | - |
| 5 | E2ESG | 25 | Yes | End-to-end Dense Video Captioning as Sequence Ge... | 2022-04-18 | - |