Video Retrieval on ActivityNet

Metric: text-to-video R@50 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	text-to-video R@50▼	Extra Data	Paper	Date↕	Code
1	CLIP4Clip	98.2	No	CLIP4Clip: An Empirical Study of CLIP for End to...	2021-04-18	Code
2	EMCL-Net++	98.1	No	Expectation-Maximization Contrastive Learning fo...	2022-11-21	Code
3	MMT-Pretrained	94.5	Yes	Multi-modal Transformer for Video Retrieval	2020-07-21	Code
4	HD-VILA	94	No	Advancing High-Resolution Video-Language Represe...	2021-11-19	Code
5	TACo	93.4	Yes	TACo: Token-aware Cascade Contrastive Learning f...	2021-08-23	-
6	MMT	93.2	No	Multi-modal Transformer for Video Retrieval	2020-07-21	Code
7	Collaborative Experts	91.4	No	Use What You Have: Video Retrieval Using Represe...	2019-07-31	Code

#1CLIP4ClipSOTA
98.2
text-to-video R@50· 2021-04-18
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Code
#2EMCL-Net++
98.1
text-to-video R@50· 2022-11-21
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Code
#3MMT-PretrainedSOTA
94.5
text-to-video R@50· Extra Data· 2020-07-21
Multi-modal Transformer for Video Retrieval Code
#4HD-VILA
94
text-to-video R@50· 2021-11-19
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions Code
#5TACo
93.4
text-to-video R@50· Extra Data· 2021-08-23
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
#6MMT
93.2
text-to-video R@50· 2020-07-21
Multi-modal Transformer for Video Retrieval Code
#7Collaborative ExpertsSOTA
91.4
text-to-video R@50· 2019-07-31
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Code