TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Retrieval/ActivityNet

Video Retrieval on ActivityNet

Metric: text-to-video R@1 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕text-to-video R@1▼Extra DataPaperDate↕Code
1InternVideo2-6B74.1YesInternVideo2: Scaling Foundation Models for Mult...2024-03-22Code
2VAST70.5YesVAST: A Vision-Audio-Subtitle-Text Omni-Modality...2023-05-29Code
3VALOR70.1YesVALOR: Vision-Audio-Language Omni-Perception Pre...2023-04-17Code
4GRAM69.9YesGramian Multimodal Representation Learning and A...2024-12-16Code
5COSA67.3YesCOSA: Concatenated Sample Pretrained Vision-Lang...2023-06-15Code
6UMT-L (ViT-L/16)66.8YesUnmasked Teacher: Towards Training-Efficient Vid...2023-03-28Code
7vid-TLDR (UMT-L)66.7Yesvid-TLDR: Training Free Token merging for Light-...2024-03-20Code
8InternVideo62.2YesInternVideo: General Video Foundation Models via...2022-12-06Code
9CLIP-ViP61.4YesCLIP-ViP: Adapting Pre-trained Image-Text Model ...2022-09-14Code
10HunYuan_tvr57.3YesTencent Text-Video Retrieval: Hierarchical Cross...2022-04-07-
11VindLU55YesVindLU: A Recipe for Effective Video-and-Languag...2022-12-09Code
12TESTA (ViT-B/16)54.8YesTESTA: Temporal-Spatial Token Aggregation for Lo...2023-10-29Code
13RTQ53.5NoRTQ: Rethinking Video-language Understanding Bas...2023-12-01Code
14DMAE (ViT-B/32)53.4NoDual-Modal Attention-Enhanced Text-Video Retriev...2023-09-20Code
15CAMoE51YesImproving Video-Text Retrieval by Multi-Stream C...2021-09-09Code
16EMCL-Net++50.6NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
17HiTeA49.7YesHiTeA: Hierarchical Temporal-Aware Video-Languag...2022-12-30-
18DiffusionRet+QB-Norm48.1NoDiffusionRet: Generative Text-Video Retrieval wi...2023-03-17Code
19Singularity47.1YesRevealing Single Frame Bias for Video-and-Langua...2022-06-07Code
20CenterCLIP (ViT-B/16)46.2YesCenterCLIP: Token Clustering for Efficient Text-...2022-05-02Code
21X-CLIP46.2NoX-CLIP: End-to-End Multi-grained Contrastive Lea...2022-07-15Code
22DiffusionRet45.8NoDiffusionRet: Generative Text-Video Retrieval wi...2023-03-17Code
23HBI42.2NoVideo-Text as Game Players: Hierarchical Banzhaf...2023-03-25Code
24EMCL-Net41.2NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
25CLIP4Clip40.5NoCLIP4Clip: An Empirical Study of CLIP for End to...2021-04-18Code
26TACo30.4YesTACo: Token-aware Cascade Contrastive Learning f...2021-08-23-
27MMT-Pretrained28.7YesMulti-modal Transformer for Video Retrieval2020-07-21Code
28HD-VILA28.5NoAdvancing High-Resolution Video-Language Represe...2021-11-19Code
29Ours25.4NoVideo and Text Matching with Conditioned Embeddi...2021-10-21Code
30MMT22.7NoMulti-modal Transformer for Video Retrieval2020-07-21Code
31Collaborative Experts20.5NoUse What You Have: Video Retrieval Using Represe...2019-07-31Code