Metric: video-to-text Mean Rank (higher is better)
| # | Model↕ | video-to-text Mean Rank▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | DiffusionRet+QB-Norm | 6.7 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 2 | HBI | 6.5 | No | Video-Text as Game Players: Hierarchical Banzhaf... | 2023-03-25 | Code |
| 3 | X-CLIP | 6.4 | No | X-CLIP: End-to-End Multi-grained Contrastive Lea... | 2022-07-15 | Code |
| 4 | DiffusionRet | 6.3 | No | DiffusionRet: Generative Text-Video Retrieval wi... | 2023-03-17 | Code |
| 5 | CenterCLIP (ViT-B/16) | 5.5 | Yes | CenterCLIP: Token Clustering for Efficient Text-... | 2022-05-02 | Code |
| 6 | HunYuan_tvr | 3.4 | Yes | Tencent Text-Video Retrieval: Hierarchical Cross... | 2022-04-07 | - |
| 7 | EMCL-Net | 2 | No | Expectation-Maximization Contrastive Learning fo... | 2022-11-21 | Code |
| 8 | EMCL-Net++ | 1 | No | Expectation-Maximization Contrastive Learning fo... | 2022-11-21 | Code |