Metric: text-to-video R@5 (higher is better)
| # | Model↕ | text-to-video R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | vid-TLDR (UMT-L) | 93.3 | Yes | vid-TLDR: Training Free Token merging for Light-... | 2024-03-20 | Code |
| 2 | UMT-L (ViT-L/16) | 92.7 | Yes | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 3 | HiTeA | 89.1 | Yes | HiTeA: Hierarchical Temporal-Aware Video-Languag... | 2022-12-30 | - |
| 4 | VindLU | 81.8 | Yes | VindLU: A Recipe for Effective Video-and-Languag... | 2022-12-09 | Code |
| 5 | Singularity-temporal | 75.9 | Yes | Revealing Single Frame Bias for Video-and-Langua... | 2022-06-07 | Code |