TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Retrieval/MSR-VTT-1kA

Video Retrieval on MSR-VTT-1kA

Metric: video-to-text R@1 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕video-to-text R@1▼Extra DataPaperDate↕Code
1HunYuan_tvr (huge)64.8YesTencent Text-Video Retrieval: Hierarchical Cross...2022-04-07-
2DRL56.2YesDisentangled Representation Learning for Text-Vi...2022-03-14Code
3DMAE (ViT-B/16)55.7NoDual-Modal Attention-Enhanced Text-Video Retriev...2023-09-20Code
4HunYuan_tvr55.5YesTencent Text-Video Retrieval: Hierarchical Cross...2022-04-07-
5PIDRo54.5No---
6CLIP2TV54.1YesCLIP2TV: Align, Match and Distill for Video-Text...2021-11-10-
7EMCL-Net++51.8NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
8CAMoE50.3YesImproving Video-Text Retrieval by Multi-Stream C...2021-09-09Code
9DiffusionRet+QB-Norm49.3NoDiffusionRet: Generative Text-Video Retrieval wi...2023-03-17Code
10Cap4Video49NoCap4Video: What Can Auxiliary Captions Do for Te...2022-12-31Code
11X-CLIP48.9NoX-CLIP: End-to-End Multi-grained Contrastive Lea...2022-07-15Code
12PAU48.3NoPrototype-based Aleatoric Uncertainty Quantifica...2023-09-29Code
13DiffusionRet47.7NoDiffusionRet: Generative Text-Video Retrieval wi...2023-03-17Code
14CenterCLIP (ViT-B/16)47.7YesCenterCLIP: Token Clustering for Efficient Text-...2022-05-02Code
15SuMA (ViT-B/16)47.3NoVideo-Text Retrieval by Supervised Sparse Multi-...2023-02-19Code
16UCoFiA47.1NoUnified Coarse-to-Fine Alignment for Video-Text ...2023-09-18Code
17HBI46.8NoVideo-Text as Game Players: Hierarchical Banzhaf...2023-03-25Code
18EMCL-Net46.5NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
19X-Pool44.4YesX-Pool: Cross-Modal Language-Video Attention for...2022-03-28Code
20CLIP2Video43.3YesCLIP2Video: Mastering Video-Text Retrieval via I...2021-06-21Code
21Socratic Models42.8NoSocratic Models: Composing Zero-Shot Multimodal ...2022-04-01Code
22CLIP4Clip42.7YesCLIP4Clip: An Empirical Study of CLIP for End to...2021-04-18Code
23CLIP27.2YesA Straightforward Framework For Video Retrieval ...2021-02-24Code