TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video/ActivityNet

Video on ActivityNet

Metric: text-to-video R@5 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕text-to-video R@5▼Extra DataPaperDate↕Code
1VAST90.9YesVAST: A Vision-Audio-Subtitle-Text Omni-Modality...2023-05-29Code
2VALOR90.8YesVALOR: Vision-Audio-Language Omni-Perception Pre...2023-04-17Code
3UMT-L (ViT-L/16)89.1YesUnmasked Teacher: Towards Training-Efficient Vid...2023-03-28Code
4vid-TLDR (UMT-L)88.6Yesvid-TLDR: Training Free Token merging for Light-...2024-03-20Code
5CLIP-ViP85.7YesCLIP-ViP: Adapting Pre-trained Image-Text Model ...2022-09-14Code
6HunYuan_tvr84.8YesTencent Text-Video Retrieval: Hierarchical Cross...2022-04-07-
7VindLU81.4YesVindLU: A Recipe for Effective Video-and-Languag...2022-12-09Code
8RTQ81.4NoRTQ: Rethinking Video-language Understanding Bas...2023-12-01Code
9TESTA (ViT-B/16)80.8YesTESTA: Temporal-Spatial Token Aggregation for Lo...2023-10-29Code
10DMAE (ViT-B/32)80.7NoDual-Modal Attention-Enhanced Text-Video Retriev...2023-09-20Code
11EMCL-Net++78.7NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
12CAMoE77.7YesImproving Video-Text Retrieval by Multi-Stream C...2021-09-09Code
13HiTeA77.1YesHiTeA: Hierarchical Temporal-Aware Video-Languag...2022-12-30-
14CenterCLIP (ViT-B/16)77YesCenterCLIP: Token Clustering for Efficient Text-...2022-05-02Code
15DiffusionRet75.6NoDiffusionRet: Generative Text-Video Retrieval wi...2023-03-17Code
16Singularity75.5YesRevealing Single Frame Bias for Video-and-Langua...2022-06-07Code
17X-CLIP75.5NoX-CLIP: End-to-End Multi-grained Contrastive Lea...2022-07-15Code
18CLIP4Clip73.4NoCLIP4Clip: An Empirical Study of CLIP for End to...2021-04-18Code
19HBI73NoVideo-Text as Game Players: Hierarchical Banzhaf...2023-03-25Code
20EMCL-Net72.7NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
21MMT-Pretrained61.4YesMulti-modal Transformer for Video Retrieval2020-07-21Code
22TACo61.2YesTACo: Token-aware Cascade Contrastive Learning f...2021-08-23-
23Ours59.1NoVideo and Text Matching with Conditioned Embeddi...2021-10-21Code
24HD-VILA57.4NoAdvancing High-Resolution Video-Language Represe...2021-11-19Code
25MMT54.2NoMulti-modal Transformer for Video Retrieval2020-07-21Code
26Collaborative Experts47.7NoUse What You Have: Video Retrieval Using Represe...2019-07-31Code