TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Captioning/ActivityNet Captions

Video Captioning on ActivityNet Captions

Metric: CIDEr (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕CIDEr▼Extra DataPaperDate↕Code
1VideoCoCa39.3YesVideoCoCa: Video-Text Modeling with Zero-Shot Tr...2022-12-09-
2GVL33.33NoLearning Grounded Vision-Language Representation...2023-03-11Code
3CM²33.01NoDo You Remember? Dense Video Captioning with Cro...2024-04-11Code
4VLCap (ae-test split) - Appearance + Language31.29NoVLCap: Vision-Language with Contrastive Learning...2022-06-26Code
5PDVC (TSP features, no SCST)31.14NoEnd-to-End Dense Video Captioning with Parallel ...2021-08-17Code
6VLTinT (ae-test split) C3D/Ling31.13NoVLTinT: Visual-Linguistic Transformer-in-Transfo...2022-11-28Code
7COOT (ae-test split) - Only Appearance features28.19NoCOOT: Cooperative Hierarchical Transformer for V...2020-11-01Code
8Vid2Seq28YesVid2Seq: Large-Scale Pretraining of a Visual Lan...2023-02-27Code
9VTimeLLM27.6NoVTimeLLM: Empower LLM to Grasp Video Moments2023-11-30Code
10MART (ae-test split) - Appearance + Flow23.42NoMART: Memory-Augmented Recurrent Transformer for...2020-05-11Code
11ADV-INF + Global19.4No--Code