Zero-Shot Video Retrieval on YouCook2

Metric: text-to-video R@5 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	text-to-video R@5▼	Extra Data	Paper	Date↕	Code
1	OmniVec2	54.1	No	-	-	-
2	Norton	51.9	No	Multi-granularity Correspondence Learning from L...	2024-01-30	Code
3	VideoCLIP	50.4	No	VideoCLIP: Contrastive Pre-training for Zero-sho...	2021-09-28	Code
4	VAST, HowToCaption-finetuned	43.6	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code
5	TACo	43.2	No	TACo: Token-aware Cascade Contrastive Learning f...	2021-08-23	-
6	VideoCOca	43	No	VideoCoCa: Video-Text Modeling with Zero-Shot Tr...	2022-12-09	-
7	MIL-NCE	38	No	End-to-End Learning of Visual Representations fr...	2019-12-13	Code
8	HowToCaption	33.1	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code

#1OmniVec2
54.1
text-to-video R@5
No paper
#2NortonSOTA
51.9
text-to-video R@5· 2024-01-30
Multi-granularity Correspondence Learning from Long-term Noisy Videos Code
#3VideoCLIPSOTA
50.4
text-to-video R@5· 2021-09-28
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Code
#4VAST, HowToCaption-finetuned
43.6
text-to-video R@5· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code
#5TACoSOTA
43.2
text-to-video R@5· 2021-08-23
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
#6VideoCOca
43
text-to-video R@5· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
#7MIL-NCESOTA
38
text-to-video R@5· 2019-12-13
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Code
#8HowToCaption
33.1
text-to-video R@5· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code