TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Retrieval/YouCook2

Video Retrieval on YouCook2

Metric: text-to-video R@10 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕text-to-video R@10▼Extra DataPaperDate↕Code
1VAST80.8YesVAST: A Vision-Audio-Subtitle-Text Omni-Modality...2023-05-29Code
2VideoCLIP75YesVideoCLIP: Contrastive Pre-training for Zero-sho...2021-09-28Code
3UniVL + MELTR74.8NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code
4MDMMT-274.8YesMDMMT-2: Multidomain Multimodal Transformer for ...2022-03-14-
5TACo72.7YesTACo: Token-aware Cascade Contrastive Learning f...2021-08-23-
6OmniVec70.8YesOmniVec: Learning robust representations with cr...2023-11-07-
7UniVL70YesUniVL: A Unified Video and Language Pre-Training...2020-02-15Code
8VLM69.38YesVLM: Task-agnostic Video-Language Model Pre-trai...2021-05-20Code
9OmniVec (pretrained)64.2YesOmniVec: Learning robust representations with cr...2023-11-07-
10VideoCLIP (zero-shot)63.1YesVideoCLIP: Contrastive Pre-training for Zero-sho...2021-09-28Code
11VideoCoCa (zero-shot)55.2NoVideoCoCa: Video-Text Modeling with Zero-Shot Tr...2022-12-09-
12COOT52.3NoCOOT: Cooperative Hierarchical Transformer for V...2020-11-01Code
13Text-Video Embedding35.3NoHowTo100M: Learning a Text-Video Embedding by Wa...2019-06-07Code
14RoME25.2NoRoME: Role-aware Mixture-of-Expert Transformer f...2022-06-26Code
15HGLMM FV CCA21.6No---
16Satar et al.20.8NoSemantic Role Aware Correlation Transformer for ...2022-06-26Code