Metric: text-to-video R@1 (higher is better)
| # | Model↕ | text-to-video R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | UMT-L (ViT-L/16) | 73.3 | Yes | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 2 | vid-TLDR (UMT-L) | 73.1 | Yes | vid-TLDR: Training Free Token merging for Light-... | 2024-03-20 | Code |
| 3 | HiTeA | 55.2 | Yes | HiTeA: Hierarchical Temporal-Aware Video-Languag... | 2022-12-30 | - |
| 4 | VindLU | 53.1 | Yes | VindLU: A Recipe for Effective Video-and-Languag... | 2022-12-09 | Code |
| 5 | Singularity-temporal | 47.4 | Yes | Revealing Single Frame Bias for Video-and-Langua... | 2022-06-07 | Code |