Metric: text-to-video Median Rank (higher is better)
| # | Model↕ | text-to-video Median Rank▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | LaT | 7 | No | LaT: Latent Translation with Cycle-Consistency f... | 2022-07-11 | - |
| 2 | M. Bain et. al. | 7 | No | Frozen in Time: A Joint Video and Image Encoder ... | 2021-04-01 | Code |
| 3 | ALPRO | 6 | No | Align and Prompt: Video-and-Language Pre-trainin... | 2021-12-17 | Code |
| 4 | OA-Trans | 6 | No | Object-aware Video-language Pre-training for Ret... | 2021-12-01 | Code |
| 5 | MILES | 5 | No | MILES: Visual BERT Pre-training with Injected La... | 2022-04-26 | Code |
| 6 | Y. Ge et. al. | 5 | No | Bridging Video-text Retrieval with Multiple Choi... | 2022-01-13 | Code |
| 7 | Clover | 4 | Yes | Clover: Towards A Unified Video-Language Alignme... | 2022-07-16 | Code |
| 8 | LanguageBind(ViT-H/14) | 2 | Yes | LanguageBind: Extending Video-Language Pretraini... | 2023-10-03 | Code |
| 9 | LanguageBind(ViT-L/14) | 2 | Yes | LanguageBind: Extending Video-Language Pretraini... | 2023-10-03 | Code |