Metric: R@1 (higher is better)
| # | Model↕ | R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | InternVideo2-6B | 55.2 | Yes | InternVideo2: Scaling Foundation Models for Mult... | 2024-03-22 | Code |
| 2 | VAST | 52 | Yes | VAST: A Vision-Audio-Subtitle-Text Omni-Modality... | 2023-05-29 | Code |
| 3 | ONE-PEACE | 42.5 | Yes | ONE-PEACE: Exploring One General Representation ... | 2023-05-18 | Code |
| 4 | VALOR | 40.1 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |
| 5 | AL-MixGen + Multi-TTA | 34.7 | No | Exploring Train and Test-Time Augmentations for ... | 2022-10-31 | - |
| 6 | QB-Norm+CE | 23.9 | No | Cross Modal Retrieval with Querybank Normalisation | 2021-12-23 | Code |