Metric: text-to-video R@50 (higher is better)
| # | Model↕ | text-to-video R@50▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CLIP4Clip | 98.2 | No | CLIP4Clip: An Empirical Study of CLIP for End to... | 2021-04-18 | Code |
| 2 | EMCL-Net++ | 98.1 | No | Expectation-Maximization Contrastive Learning fo... | 2022-11-21 | Code |
| 3 | MMT-Pretrained | 94.5 | Yes | Multi-modal Transformer for Video Retrieval | 2020-07-21 | Code |
| 4 | HD-VILA | 94 | No | Advancing High-Resolution Video-Language Represe... | 2021-11-19 | Code |
| 5 | TACo | 93.4 | Yes | TACo: Token-aware Cascade Contrastive Learning f... | 2021-08-23 | - |
| 6 | MMT | 93.2 | No | Multi-modal Transformer for Video Retrieval | 2020-07-21 | Code |
| 7 | Collaborative Experts | 91.4 | No | Use What You Have: Video Retrieval Using Represe... | 2019-07-31 | Code |