Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Captioning
/
ActivityNet Captions
Video Captioning on ActivityNet Captions
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
VideoCoCa
39.3
Yes
VideoCoCa: Video-Text Modeling with Zero-Shot Tr...
2022-12-09
-
2
GVL
33.33
No
Learning Grounded Vision-Language Representation...
2023-03-11
Code
3
CM²
33.01
No
Do You Remember? Dense Video Captioning with Cro...
2024-04-11
Code
4
VLCap (ae-test split) - Appearance + Language
31.29
No
VLCap: Vision-Language with Contrastive Learning...
2022-06-26
Code
5
PDVC (TSP features, no SCST)
31.14
No
End-to-End Dense Video Captioning with Parallel ...
2021-08-17
Code
6
VLTinT (ae-test split) C3D/Ling
31.13
No
VLTinT: Visual-Linguistic Transformer-in-Transfo...
2022-11-28
Code
7
COOT (ae-test split) - Only Appearance features
28.19
No
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
8
Vid2Seq
28
Yes
Vid2Seq: Large-Scale Pretraining of a Visual Lan...
2023-02-27
Code
9
VTimeLLM
27.6
No
VTimeLLM: Empower LLM to Grasp Video Moments
2023-11-30
Code
10
MART (ae-test split) - Appearance + Flow
23.42
No
MART: Memory-Augmented Recurrent Transformer for...
2020-05-11
Code
11
ADV-INF + Global
19.4
No
-
-
Code