Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Retrieval
/
MSR-VTT
Video Retrieval on MSR-VTT
Metric: text-to-video Median Rank (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
text-to-video Median Rank
▼
Extra Data
Paper
Date
↕
Code
1
C+LSTM+SA+FC7
55
No
Learning Language-Visual Embedding for Movie Und...
2016-09-26
-
2
Kaufman
41
No
Temporal Tessellation: A Unified Approach for Vi...
2016-12-21
Code
3
JEMC
29.7
No
-
-
Code
4
RoME
17
No
RoME: Role-aware Mixture-of-Expert Transformer f...
2022-06-26
Code
5
Collaborative Experts
16
No
Use What You Have: Video Retrieval Using Represe...
2019-07-31
Code
6
JSFusion
13
No
A Joint Sequence Fusion Model for Video Question...
2018-08-07
Code
7
CLIP
10
No
A Straightforward Framework For Video Retrieval ...
2021-02-24
Code
8
Text-Video Embedding
9
No
HowTo100M: Learning a Text-Video Embedding by Wa...
2019-06-07
Code
9
MDMMT
6
Yes
MDMMT: Multidomain Multimodal Transformer for Vi...
2021-03-19
Code
10
UniVL
6
Yes
UniVL: A Unified Video and Language Pre-Training...
2020-02-15
Code
11
TACo
5
Yes
TACo: Token-aware Cascade Contrastive Learning f...
2021-08-23
-
12
CLIP2Video
4
Yes
CLIP2Video: Mastering Video-Text Retrieval via I...
2021-06-21
Code
13
UniVL + MELTR
4
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
14
MDMMT-2
3
Yes
MDMMT-2: Multidomain Multimodal Transformer for ...
2022-03-14
-
15
VIOLET + MELTR
3
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
16
CLIP2TV
3
Yes
CLIP2TV: Align, Match and Distill for Video-Text...
2021-11-10
-
17
CAMoE
3
Yes
Improving Video-Text Retrieval by Multi-Stream C...
2021-09-09
Code
18
COTS
3
No
COTS: Collaborative Two-Stream Vision-Language P...
2022-04-15
-
19
Ours
3
No
Video and Text Matching with Conditioned Embeddi...
2021-10-21
Code