Metric: R@10 (higher is better)
| # | Model↕ | R@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaSST-RoBERTa & Estimated Audio–Caption Correspondences | 70.39 | Yes | Estimated Audio-Caption Correspondences Improve ... | 2024-08-21 | Code |
| 2 | PaSST–RoBERTa & GPT-augment | 69.3 | Yes | Advancing Natural-Language Based Audio Retrieval... | 2023-08-08 | Code |
| 3 | VAST | 66.1 | Yes | VAST: A Vision-Audio-Subtitle-Text Omni-Modality... | 2023-05-29 | Code |
| 4 | ONE-PEACE | 62.7 | Yes | ONE-PEACE: Exploring One General Representation ... | 2023-05-18 | Code |
| 5 | VALOR | 55.3 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |