Zero-Shot Video Retrieval on LSMDC

Metric: text-to-video Median Rank (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	text-to-video Median Rank▼	Extra Data	Paper	Date↕	Code
1	MILES	50.7	No	MILES: Visual BERT Pre-training with Injected La...	2022-04-26	Code
2	Y. Ge et. al.	42	No	Bridging Video-text Retrieval with Multiple Choi...	2022-01-13	Code
3	HowToCaption	29	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code
4	CLIP4Clip	28	Yes	CLIP4Clip: An Empirical Study of CLIP for End to...	2021-04-18	Code
5	Clover	24	Yes	Clover: Towards A Unified Video-Language Alignme...	2022-07-16	Code
6	VAST, HowToCaption-finetuned	7	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code

#1MILESSOTA
50.7
text-to-video Median Rank· 2022-04-26
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval Code
#2Y. Ge et. al.SOTA
42
text-to-video Median Rank· 2022-01-13
Bridging Video-text Retrieval with Multiple Choice Questions Code
#3HowToCaption
29
text-to-video Median Rank· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code
#4CLIP4ClipSOTA
28
text-to-video Median Rank· Extra Data· 2021-04-18
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Code
#5Clover
24
text-to-video Median Rank· Extra Data· 2022-07-16
Clover: Towards A Unified Video-Language Alignment and Fusion Model Code
#6VAST, HowToCaption-finetuned
7
text-to-video Median Rank· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code