Metric: video-to-text R@5 (higher is better)
| # | Model↕ | video-to-text R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | vid-TLDR (UMT-L) | 89.8 | Yes | vid-TLDR: Training Free Token merging for Light-... | 2024-03-20 | Code |
| 2 | UMT-L (ViT-L/16) | 89.6 | Yes | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 3 | HunYuan_tvr | 79.9 | Yes | Tencent Text-Video Retrieval: Hierarchical Cross... | 2022-04-07 | - |
| 4 | Cap4Video | 78.5 | No | Cap4Video: What Can Auxiliary Captions Do for Te... | 2022-12-31 | Code |
| 5 | HunYuan_tvr (huge) | 78.3 | Yes | Tencent Text-Video Retrieval: Hierarchical Cross... | 2022-04-07 | - |
| 6 | DiffusionRet+QB-Norm | 75.1 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 7 | DiffusionRet | 74.3 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 8 | PAU | 74.2 | No | Prototype-based Aleatoric Uncertainty Quantifica... | 2023-09-29 | Code |
| 9 | HBI | 73 | No | Video-Text as Game Players: Hierarchical Banzhaf... | 2023-03-25 | Code |