Metric: text-to-video R@5 (higher is better)
| # | Model↕ | text-to-video R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | UMT-L (ViT-L/16) | 100 | Yes | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 2 | vid-TLDR (UMT-L) | 100 | Yes | vid-TLDR: Training Free Token merging for Light-... | 2024-03-20 | Code |
| 3 | HiTeA | 100 | Yes | HiTeA: Hierarchical Temporal-Aware Video-Languag... | 2022-12-30 | - |
| 4 | VindLU | 100 | Yes | VindLU: A Recipe for Effective Video-and-Languag... | 2022-12-09 | Code |
| 5 | Singularity-temporal | 96 | Yes | Revealing Single Frame Bias for Video-and-Langua... | 2022-06-07 | Code |