Metric: video-to-text R@1 (higher is better)
| # | Model↕ | video-to-text R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | InternVideo2-6B | 89.3 | Yes | InternVideo2: Scaling Foundation Models for Mult... | 2024-03-22 | Code |
| 2 | InternVideo | 87.2 | No | InternVideo: General Video Foundation Models via... | 2022-12-06 | Code |
| 3 | Unmasked Teacher | 86 | No | Unmasked Teacher: Towards Training-Efficient Vid... | 2023-03-28 | Code |
| 4 | GRAM | 84.6 | Yes | Gramian Multimodal Representation Learning and A... | 2024-12-16 | Code |
| 5 | Cap4Video | 80.9 | No | Cap4Video: What Can Auxiliary Captions Do for Te... | 2022-12-31 | Code |