Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video
/
YouCook2
Video on YouCook2
Metric: text-to-video R@1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
text-to-video R@1
▼
Extra Data
Paper
Date
↕
Code
1
VAST
50.4
Yes
VAST: A Vision-Audio-Subtitle-Text Omni-Modality...
2023-05-29
Code
2
UniVL + MELTR
33.7
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
3
VideoCLIP
32.2
Yes
VideoCLIP: Contrastive Pre-training for Zero-sho...
2021-09-28
Code
4
MDMMT-2
32
Yes
MDMMT-2: Multidomain Multimodal Transformer for ...
2022-03-14
-
5
TACo
29.6
Yes
TACo: Token-aware Cascade Contrastive Learning f...
2021-08-23
-
6
UniVL
28.9
Yes
UniVL: A Unified Video and Language Pre-Training...
2020-02-15
Code
7
VLM
27.05
Yes
VLM: Task-agnostic Video-Language Model Pre-trai...
2021-05-20
Code
8
VideoCLIP (zero-shot)
22.7
Yes
VideoCLIP: Contrastive Pre-training for Zero-sho...
2021-09-28
Code
9
VideoCoCa (zero-shot)
21.7
No
VideoCoCa: Video-Text Modeling with Zero-Shot Tr...
2022-12-09
-
10
COOT
16.7
No
COOT: Cooperative Hierarchical Transformer for V...
2020-11-01
Code
11
Text-Video Embedding
8.2
No
HowTo100M: Learning a Text-Video Embedding by Wa...
2019-06-07
Code
12
RoME
6.3
No
RoME: Role-aware Mixture-of-Expert Transformer f...
2022-06-26
Code
13
Satar et al.
5.3
No
Semantic Role Aware Correlation Transformer for ...
2022-06-26
Code
14
HGLMM FV CCA
4.6
No
-
-
-