Metric: text-to-video Median Rank (higher is better)
| # | Model↕ | text-to-video Median Rank▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | MILES | 50.7 | No | MILES: Visual BERT Pre-training with Injected La... | 2022-04-26 | Code |
| 2 | Y. Ge et. al. | 42 | No | Bridging Video-text Retrieval with Multiple Choi... | 2022-01-13 | Code |
| 3 | HowToCaption | 29 | No | HowToCaption: Prompting LLMs to Transform Video ... | 2023-10-07 | Code |
| 4 | CLIP4Clip | 28 | Yes | CLIP4Clip: An Empirical Study of CLIP for End to... | 2021-04-18 | Code |
| 5 | Clover | 24 | Yes | Clover: Towards A Unified Video-Language Alignme... | 2022-07-16 | Code |
| 6 | VAST, HowToCaption-finetuned | 7 | No | HowToCaption: Prompting LLMs to Transform Video ... | 2023-10-07 | Code |