Metric: text-to-video R@1 (higher is better)
| # | Model↕ | text-to-video R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | TESTA (ViT-B/16) | 83.4 | Yes | TESTA: Temporal-Spatial Token Aggregation for Lo... | 2023-10-29 | Code |
| 2 | LF-VILA | 69.7 | Yes | Long-Form Video-Language Pre-Training with Multi... | 2022-10-12 | Code |
| 3 | VINDLU | 67.8 | Yes | VindLU: A Recipe for Effective Video-and-Languag... | 2022-12-09 | Code |
| 4 | Frozen | 53.8 | Yes | Frozen in Time: A Joint Video and Image Encoder ... | 2021-04-01 | Code |
| 5 | QB-Norm+TT-CE+ | 15.1 | No | Cross Modal Retrieval with Querybank Normalisation | 2021-12-23 | Code |