Video Captioning on ViTT

Metric: CIDEr (higher is better)

LeaderboardDataset
Loading chart...
#ModelCIDErExtra DataPaperDateCode
1HiCM²51.2YesHiCM$^2$: Hierarchical Compact Memory Modeling f...2024-12-19Code
2Vid2Seq (VidChapters-7M PT)50.9Yes---
3Vid2Seq43.5YesVid2Seq: Large-Scale Pretraining of a Visual Lan...2023-02-27Code
4Vid2Seq (VidChapters-7M PT)30.2Yes---
5E2ESG25YesEnd-to-end Dense Video Captioning as Sequence Ge...2022-04-18-