Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Captioning
/
ActivityNet Captions
Video Captioning on ActivityNet Captions
Metric: METEOR (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
METEOR (best first)
METEOR (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
METEOR
▼
Extra Data
Paper
Date
↕
Code
1
VLTinT (ae-test split) C3D/Ling
17.97
No
VLTinT: Visual-Linguistic Transformer-in-Transfo...
2022-11-28
Code
2
VLCap (ae-test split) - Appearance + Language
17.48
No
VLCap: Vision-Language with Contrastive Learning...
2022-06-26
Code
3
Vid2Seq
17
Yes
Vid2Seq: Large-Scale Pretraining of a Visual Lan...
2023-02-27
Code
4
ADV-INF + Global
16.36
No
-
-
Code
5
COOT (ae-test split) - Only Appearance features
15.99
No
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
6
MART (ae-test split) - Appearance + Flow
15.68
No
MART: Memory-Augmented Recurrent Transformer for...
2020-05-11
Code
7
Bi-directional+intra captioning
11.28
No
Team RUC_AIM3 Technical Report at Activitynet 20...
2020-06-14
-
8
GVL
10.03
No
Learning Grounded Vision-Language Representation...
2023-03-11
Code
9
TSRM-CMG-HRNN+SCST
9.71
No
Dense-Captioning Events in Videos: SYSU Submissi...
2020-06-21
Code
10
PDVC (TSP features, no SCST)
9.03
No
End-to-End Dense Video Captioning with Parallel ...
2021-08-17
Code
11
TSP
8.75
No
TSP: Temporally-Sensitive Pretraining of Video E...
2020-11-23
Code
12
CM²
8.55
No
Do You Remember? Dense Video Captioning with Cro...
2024-04-11
Code
13
BMT
8.44
No
A Better Use of Audio-Visual Cues: Dense Video C...
2020-05-17
Code
14
iPerceive (Chadha et al., 2020)
7.87
No
iPerceive: Applying Common-Sense Reasoning to Mu...
2020-11-16
-
15
MDVC
7.31
No
Multi-modal Dense Video Captioning
2020-03-17
Code
#1
VLTinT (ae-test split) C3D/Ling
SOTA
17.97
METEOR
· 2022-11-28
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Code
#2
VLCap (ae-test split) - Appearance + Language
SOTA
17.48
METEOR
· 2022-06-26
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Code
#3
Vid2Seq
17
METEOR
· Extra Data
· 2023-02-27
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Code
#4
ADV-INF + Global
16.36
METEOR
No paper
Code
#5
COOT (ae-test split) - Only Appearance features
SOTA
15.99
METEOR
· 2020-11-01
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Code
#6
MART (ae-test split) - Appearance + Flow
SOTA
15.68
METEOR
· 2020-05-11
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Code
#7
Bi-directional+intra captioning
11.28
METEOR
· 2020-06-14
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning
#8
GVL
10.03
METEOR
· 2023-03-11
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Code
#9
TSRM-CMG-HRNN+SCST
9.71
METEOR
· 2020-06-21
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020
Code
#10
PDVC (TSP features, no SCST)
9.03
METEOR
· 2021-08-17
End-to-End Dense Video Captioning with Parallel Decoding
Code
#11
TSP
8.75
METEOR
· 2020-11-23
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Code
#12
CM²
8.55
METEOR
· 2024-04-11
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Code
#13
BMT
8.44
METEOR
· 2020-05-17
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Code
#14
iPerceive (Chadha et al., 2020)
7.87
METEOR
· 2020-11-16
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
#15
MDVC
SOTA
7.31
METEOR
· 2020-03-17
Multi-modal Dense Video Captioning
Code