Zero-Shot Video Retrieval on YouCook2

Metric: text-to-video R@10 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	text-to-video R@10▼	Extra Data	Paper	Date↕	Code
1	OmniVec2	70.8	No	-	-	-
2	Norton	64.1	No	Multi-granularity Correspondence Learning from L...	2024-01-30	Code
3	VideoCLIP	63.1	No	VideoCLIP: Contrastive Pre-training for Zero-sho...	2021-09-28	Code
4	TACo	55.7	No	TACo: Token-aware Cascade Contrastive Learning f...	2021-08-23	-
5	VAST, HowToCaption-finetuned	53.9	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code
6	VideoCOca	53.3	No	VideoCoCa: Video-Text Modeling with Zero-Shot Tr...	2022-12-09	-
7	MIL-NCE	51.2	No	End-to-End Learning of Visual Representations fr...	2019-12-13	Code
8	VATT-MBS	45.5	No	VATT: Transformers for Multimodal Self-Supervise...	2021-04-22	Code
9	HowToCaption	44.1	No	HowToCaption: Prompting LLMs to Transform Video ...	2023-10-07	Code

#1OmniVec2
70.8
text-to-video R@10
No paper
#2NortonSOTA
64.1
text-to-video R@10· 2024-01-30
Multi-granularity Correspondence Learning from Long-term Noisy Videos Code
#3VideoCLIPSOTA
63.1
text-to-video R@10· 2021-09-28
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Code
#4TACoSOTA
55.7
text-to-video R@10· 2021-08-23
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
#5VAST, HowToCaption-finetuned
53.9
text-to-video R@10· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code
#6VideoCOca
53.3
text-to-video R@10· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
#7MIL-NCESOTA
51.2
text-to-video R@10· 2019-12-13
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Code
#8VATT-MBS
45.5
text-to-video R@10· 2021-04-22
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Code
#9HowToCaption
44.1
text-to-video R@10· 2023-10-07
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Code