Video Retrieval on QuerYD

Metric: text-to-video R@1 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	text-to-video R@1▼	Extra Data	Paper	Date↕	Code
1	TESTA (ViT-B/16)	83.4	Yes	TESTA: Temporal-Spatial Token Aggregation for Lo...	2023-10-29	Code
2	LF-VILA	69.7	Yes	Long-Form Video-Language Pre-Training with Multi...	2022-10-12	Code
3	VINDLU	67.8	Yes	VindLU: A Recipe for Effective Video-and-Languag...	2022-12-09	Code
4	Frozen	53.8	Yes	Frozen in Time: A Joint Video and Image Encoder ...	2021-04-01	Code
5	QB-Norm+TT-CE+	15.1	No	Cross Modal Retrieval with Querybank Normalisation	2021-12-23	Code

#1TESTA (ViT-B/16)SOTA
83.4
text-to-video R@1· Extra Data· 2023-10-29
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding Code
#2LF-VILA SOTA
69.7
text-to-video R@1· Extra Data· 2022-10-12
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Code
#3VINDLU
67.8
text-to-video R@1· Extra Data· 2022-12-09
VindLU: A Recipe for Effective Video-and-Language Pretraining Code
#4Frozen SOTA
53.8
text-to-video R@1· Extra Data· 2021-04-01
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Code
#5QB-Norm+TT-CE+
15.1
text-to-video R@1· 2021-12-23
Cross Modal Retrieval with Querybank Normalisation Code