Dense Video Captioning on ViTT

Metric: SODA (higher is better)

LeaderboardDataset
Loading chart...
#ModelSODAExtra DataPaperDateCode
1Vid2Seq (VidChapters-7M PT)9.1Yes---
2Vid2Seq (VidChapters-7M PT)0.151Yes---
3HiCM²0.15YesHiCM$^2$: Hierarchical Compact Memory Modeling f...2024-12-19Code
4Vid2Seq0.135YesVid2Seq: Large-Scale Pretraining of a Visual Lan...2023-02-27Code