Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Captioning
/
ActivityNet Captions
Video Captioning on ActivityNet Captions
Metric: METEOR (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
METEOR
▼
Extra Data
Paper
Date
↕
Code
1
VLTinT (ae-test split) C3D/Ling
17.97
No
VLTinT: Visual-Linguistic Transformer-in-Transfo...
2022-11-28
Code
2
VLCap (ae-test split) - Appearance + Language
17.48
No
VLCap: Vision-Language with Contrastive Learning...
2022-06-26
Code
3
Vid2Seq
17
Yes
Vid2Seq: Large-Scale Pretraining of a Visual Lan...
2023-02-27
Code
4
ADV-INF + Global
16.36
No
-
-
Code
5
COOT (ae-test split) - Only Appearance features
15.99
No
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
6
MART (ae-test split) - Appearance + Flow
15.68
No
MART: Memory-Augmented Recurrent Transformer for...
2020-05-11
Code
7
Bi-directional+intra captioning
11.28
No
Team RUC_AIM3 Technical Report at Activitynet 20...
2020-06-14
-
8
GVL
10.03
No
Learning Grounded Vision-Language Representation...
2023-03-11
Code
9
TSRM-CMG-HRNN+SCST
9.71
No
Dense-Captioning Events in Videos: SYSU Submissi...
2020-06-21
Code
10
PDVC (TSP features, no SCST)
9.03
No
End-to-End Dense Video Captioning with Parallel ...
2021-08-17
Code
11
TSP
8.75
No
TSP: Temporally-Sensitive Pretraining of Video E...
2020-11-23
Code
12
CM²
8.55
No
Do You Remember? Dense Video Captioning with Cro...
2024-04-11
Code
13
BMT
8.44
No
A Better Use of Audio-Visual Cues: Dense Video C...
2020-05-17
Code
14
iPerceive (Chadha et al., 2020)
7.87
No
iPerceive: Applying Common-Sense Reasoning to Mu...
2020-11-16
-
15
MDVC
7.31
No
Multi-modal Dense Video Captioning
2020-03-17
Code