Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Captioning
/
YouCook2
Video Captioning on YouCook2
Metric: ROUGE-L (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
ROUGE-L
▼
Extra Data
Paper
Date
↕
Code
1
UniVL + MELTR
47.04
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
2
UniVL
46.52
Yes
UniVL: A Unified Video and Language Pre-Training...
2020-02-15
Code
3
VLM
41.51
Yes
VLM: Task-agnostic Video-Language Model Pre-trai...
2021-05-20
Code
4
TextKG
40.2
No
Text with Knowledge Graph Augmented Transformer ...
2023-03-22
-
5
E2vidD6-MASSvid-BiD
39.03
Yes
Multimodal Pretraining for Dense Video Captioning
2020-11-10
Code
6
E2vidD6-MASSalign-BiD
39.03
Yes
Multimodal Pretraining for Dense Video Captioning
2020-11-10
Code
7
COOT
37.94
Yes
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
8
VideoCoCa
37.7
Yes
VideoCoCa: Video-Text Modeling with Zero-Shot Tr...
2022-12-09
-
9
HowToCaption
37.3
No
HowToCaption: Prompting LLMs to Transform Video ...
2023-10-07
Code
10
OmniVL
36.09
No
OmniVL:One Foundation Model for Image-Language a...
2022-09-15
-
11
VideoBERT + S3D
28.8
No
VideoBERT: A Joint Model for Video and Language ...
2019-04-03
Code
12
Zhou
27.44
No
End-to-End Dense Video Captioning with Masked Tr...
2018-04-03
Code