Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Captioning
/
YouCook2
Video Captioning on YouCook2
Metric: METEOR (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
METEOR
▼
Extra Data
Paper
Date
↕
Code
1
UniVL + MELTR
22.56
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
2
UniVL
22.35
Yes
UniVL: A Unified Video and Language Pre-Training...
2020-02-15
Code
3
COOT
19.85
Yes
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
4
E2vidD6-MASSvid-BiD
18.32
Yes
Multimodal Pretraining for Dense Video Captioning
2020-11-10
Code
5
VLM
18.22
Yes
VLM: Task-agnostic Video-Language Model Pre-trai...
2021-05-20
Code
6
MA-LMM
17.6
No
MA-LMM: Memory-Augmented Large Multimodal Model ...
2024-04-08
Code
7
HowToCaption
15.9
No
HowToCaption: Prompting LLMs to Transform Video ...
2023-10-07
Code
8
OmniVL
14.83
No
OmniVL:One Foundation Model for Image-Language a...
2022-09-15
-
9
TextKG
14.8
No
Text with Knowledge Graph Augmented Transformer ...
2023-03-22
-
10
HiCM²
12.8
Yes
HiCM$^2$: Hierarchical Compact Memory Modeling f...
2024-12-19
Code
11
Vid2Seq (HowTo100M+VidChapters-7M PT)
12.3
Yes
-
-
-
12
VideoBERT + S3D
11.94
No
VideoBERT: A Joint Model for Video and Language ...
2019-04-03
Code
13
Zhou
11.55
No
End-to-End Dense Video Captioning with Masked Tr...
2018-04-03
Code
14
Vid2Seq
9.3
Yes
Vid2Seq: Large-Scale Pretraining of a Visual Lan...
2023-02-27
Code
15
CM²
6.08
No
Do You Remember? Dense Video Captioning with Cro...
2024-04-11
Code
16
GVL
5.01
No
Learning Grounded Vision-Language Representation...
2023-03-11
Code
17
PDVC (TSN features, no SCST)
4.74
No
End-to-End Dense Video Captioning with Parallel ...
2021-08-17
Code
18
Vid2Seq (HowTo100M+VidChapters-7M PT)
3.4
Yes
-
-
-