Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Retrieval
/
MSR-VTT-1kA
Video Retrieval on MSR-VTT-1kA
Metric: text-to-video Median Rank (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
text-to-video Median Rank
▼
Extra Data
Paper
Date
↕
Code
1
JSFusion
13
No
A Joint Sequence Fusion Model for Video Question...
2018-08-07
Code
2
HT
12
No
HowTo100M: Learning a Text-Video Embedding by Wa...
2019-06-07
Code
3
HT-Pretrained
9
No
HowTo100M: Learning a Text-Video Embedding by Wa...
2019-06-07
Code
4
BridgeFormer (Zero-shot)
7
No
Bridging Video-text Retrieval with Multiple Choi...
2022-01-13
Code
5
Collaborative Experts
6
Yes
Use What You Have: Video Retrieval Using Represe...
2019-07-31
Code
6
CLIP
4
Yes
A Straightforward Framework For Video Retrieval ...
2021-02-24
Code
7
UniVL + MELTR
4
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
8
TACo
4
No
TACo: Token-aware Cascade Contrastive Learning f...
2021-08-23
-
9
VLM
4
Yes
VLM: Task-agnostic Video-Language Model Pre-trai...
2021-05-20
Code
10
MMT-Pretrained
4
Yes
Multi-modal Transformer for Video Retrieval
2020-07-21
Code
11
MMT
4
No
Multi-modal Transformer for Video Retrieval
2020-07-21
Code
12
MAC
3
Yes
Masked Contrastive Pre-Training for Efficient Vi...
2022-12-02
-
13
BridgeFormer
3
Yes
Bridging Video-text Retrieval with Multiple Choi...
2022-01-13
Code
14
VIOLET + MELTR
3
No
MELTR: Meta Loss Transformer for Learning to Fin...
2023-03-23
Code
15
FROZEN
3
Yes
Frozen in Time: A Joint Video and Image Encoder ...
2021-04-01
Code
16
X-CLIP
2
No
X-CLIP: End-to-End Multi-grained Contrastive Lea...
2022-07-15
Code
17
DiffusionRet
2
No
DiffusionRet: Generative Text-Video Retrieval wi...
2023-03-17
Code
18
DiffusionRet+QB-Norm
2
No
DiffusionRet: Generative Text-Video Retrieval wi...
2023-03-17
Code
19
CAMoE
2
Yes
Improving Video-Text Retrieval by Multi-Stream C...
2021-09-09
Code
20
HBI
2
No
Video-Text as Game Players: Hierarchical Banzhaf...
2023-03-25
Code
21
PAU
2
No
Prototype-based Aleatoric Uncertainty Quantifica...
2023-09-29
Code
22
CenterCLIP (ViT-B/16)
2
Yes
CenterCLIP: Token Clustering for Efficient Text-...
2022-05-02
Code
23
QB-Norm+CLIP2Video
2
Yes
Cross Modal Retrieval with Querybank Normalisation
2021-12-23
Code
24
X-Pool
2
Yes
X-Pool: Cross-Modal Language-Video Attention for...
2022-03-28
Code
25
CLIP2Video
2
Yes
CLIP2Video: Mastering Video-Text Retrieval via I...
2021-06-21
Code
26
Clover
2
No
Clover: Towards A Unified Video-Language Alignme...
2022-07-16
Code
27
MDMMT
2
Yes
MDMMT: Multidomain Multimodal Transformer for Vi...
2021-03-19
Code
28
COTS
2
Yes
COTS: Collaborative Two-Stream Vision-Language P...
2022-04-15
-
29
CLIP4Clip
2
Yes
CLIP4Clip: An Empirical Study of CLIP for End to...
2021-04-18
Code
30
HunYuan_tvr (huge)
1
Yes
Tencent Text-Video Retrieval: Hierarchical Cross...
2022-04-07
-
31
CLIP-ViP
1
Yes
CLIP-ViP: Adapting Pre-trained Image-Text Model ...
2022-09-14
Code
32
PIDRo
1
No
-
-
-
33
DMAE (ViT-B/16)
1
No
Dual-Modal Attention-Enhanced Text-Video Retriev...
2023-09-20
Code
34
STAN
1
Yes
Revisiting Temporal Modeling for CLIP-based Imag...
2023-01-26
Code
35
DRL
1
Yes
Disentangled Representation Learning for Text-Vi...
2022-03-14
Code
36
CLIP2TV
1
Yes
CLIP2TV: Align, Match and Distill for Video-Text...
2021-11-10
-
37
Side4Video
1
No
Side4Video: Spatial-Temporal Side Network for Me...
2023-11-27
Code
38
Cap4Video
1
No
Cap4Video: What Can Auxiliary Captions Do for Te...
2022-12-31
Code