Metric: video-to-text Mean Rank (higher is better)
| # | Model↕ | video-to-text Mean Rank▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PO Loss | 39.6 | No | Rudder: A Cross Lingual Video and Text Retrieval... | 2021-03-09 | Code |
| 2 | DiffusionRet | 10.7 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 3 | X-CLIP | 10.5 | No | X-CLIP: End-to-End Multi-grained Contrastive Lea... | 2022-07-15 | Code |
| 4 | DiffusionRet+QB-Norm | 10.3 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 5 | CAMoE | 10.2 | Yes | Improving Video-Text Retrieval by Multi-Stream C... | 2021-09-09 | Code |
| 6 | PAU | 9.8 | No | Prototype-based Aleatoric Uncertainty Quantifica... | 2023-09-29 | Code |
| 7 | HunYuan_tvr (huge) | 9.1 | Yes | Tencent Text-Video Retrieval: Hierarchical Cross... | 2022-04-07 | - |
| 8 | HBI | 8.7 | No | Video-Text as Game Players: Hierarchical Banzhaf... | 2023-03-25 | Code |
| 9 | DRL | 7.9 | Yes | Disentangled Representation Learning for Text-Vi... | 2022-03-14 | Code |
| 10 | Cap4Video | 7.3 | No | Cap4Video: What Can Auxiliary Captions Do for Te... | 2022-12-31 | Code |
| 11 | HunYuan_tvr | 7.1 | Yes | Tencent Text-Video Retrieval: Hierarchical Cross... | 2022-04-07 | - |