Metric: text-to-video R@5 (higher is better)
| # | Model↕ | text-to-video R@5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VAST | 98.2 | Yes | VAST: A Vision-Audio-Subtitle-Text Omni-Modality... | 2023-05-29 | Code |
| 2 | VALOR | 97.1 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |
| 3 | Unmasked Teacher | 95.1 | No | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 4 | Side4Video | 93.5 | No | Side4Video: Spatial-Temporal Side Network for Me... | 2023-11-27 | Code |
| 5 | Cap4Video | 93.1 | No | Cap4Video: What Can Auxiliary Captions Do for Te... | 2022-12-31 | Code |
| 6 | TeachCLIP | 91.9 | No | - | - | Code |