TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Retrieval/MSR-VTT

Video Retrieval on MSR-VTT

Metric: text-to-video Median Rank (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕text-to-video Median Rank▼Extra DataPaperDate↕Code
1C+LSTM+SA+FC755NoLearning Language-Visual Embedding for Movie Und...2016-09-26-
2Kaufman41NoTemporal Tessellation: A Unified Approach for Vi...2016-12-21Code
3JEMC29.7No--Code
4RoME17NoRoME: Role-aware Mixture-of-Expert Transformer f...2022-06-26Code
5Collaborative Experts16NoUse What You Have: Video Retrieval Using Represe...2019-07-31Code
6JSFusion13NoA Joint Sequence Fusion Model for Video Question...2018-08-07Code
7CLIP10NoA Straightforward Framework For Video Retrieval ...2021-02-24Code
8Text-Video Embedding9NoHowTo100M: Learning a Text-Video Embedding by Wa...2019-06-07Code
9MDMMT6YesMDMMT: Multidomain Multimodal Transformer for Vi...2021-03-19Code
10UniVL6YesUniVL: A Unified Video and Language Pre-Training...2020-02-15Code
11TACo5YesTACo: Token-aware Cascade Contrastive Learning f...2021-08-23-
12CLIP2Video4YesCLIP2Video: Mastering Video-Text Retrieval via I...2021-06-21Code
13UniVL + MELTR4NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code
14MDMMT-23YesMDMMT-2: Multidomain Multimodal Transformer for ...2022-03-14-
15VIOLET + MELTR3NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code
16CLIP2TV3YesCLIP2TV: Align, Match and Distill for Video-Text...2021-11-10-
17CAMoE3YesImproving Video-Text Retrieval by Multi-Stream C...2021-09-09Code
18COTS3NoCOTS: Collaborative Two-Stream Vision-Language P...2022-04-15-
19Ours3NoVideo and Text Matching with Conditioned Embeddi...2021-10-21Code