Zero-Shot Video Retrieval on VATEX

Metric: text-to-video R@10 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	text-to-video R@10▼	Extra Data	Paper	Date↕	Code
1	GRAM	99.5	Yes	Gramian Multimodal Representation Learning and A...	2024-12-16	Code
2	InternVideo2-6B	97.1	Yes	InternVideo2: Scaling Foundation Models for Mult...	2024-03-22	Code
3	InternVideo2-1B	96.9	Yes	InternVideo2: Scaling Foundation Models for Mult...	2024-03-22	Code
4	VideoCoCa	90.1	Yes	VideoCoCa: Video-Text Modeling with Zero-Shot Tr...	2022-12-09	-

#1GRAMSOTA
99.5
text-to-video R@10· Extra Data· 2024-12-16
Gramian Multimodal Representation Learning and Alignment Code
#2InternVideo2-6BSOTA
97.1
text-to-video R@10· Extra Data· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Code
#3InternVideo2-1B
96.9
text-to-video R@10· Extra Data· 2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Code
#4VideoCoCaSOTA
90.1
text-to-video R@10· Extra Data· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners