Metric: text-to-video R@5 (higher is better)
| # | Model↕ | text-to-video R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OmniVec2 | 54.1 | No | - | - | - |
| 2 | Norton | 51.9 | No | Multi-granularity Correspondence Learning from L... | 2024-01-30 | Code |
| 3 | VideoCLIP | 50.4 | No | VideoCLIP: Contrastive Pre-training for Zero-sho... | 2021-09-28 | Code |
| 4 | VAST, HowToCaption-finetuned | 43.6 | No | HowToCaption: Prompting LLMs to Transform Video ... | 2023-10-07 | Code |
| 5 | TACo | 43.2 | No | TACo: Token-aware Cascade Contrastive Learning f... | 2021-08-23 | - |
| 6 | VideoCOca | 43 | No | VideoCoCa: Video-Text Modeling with Zero-Shot Tr... | 2022-12-09 | - |
| 7 | MIL-NCE | 38 | No | End-to-End Learning of Visual Representations fr... | 2019-12-13 | Code |
| 8 | HowToCaption | 33.1 | No | HowToCaption: Prompting LLMs to Transform Video ... | 2023-10-07 | Code |