Text to Audio Retrieval on AudioCaps

Metric: R@5 (higher is better)

LeaderboardDataset
Loading chart...
#ModelR@5Extra DataPaperDateCode
1ONE-PEACE77.5YesONE-PEACE: Exploring One General Representation ...2023-05-18Code
2VAST76.8YesVAST: A Vision-Audio-Subtitle-Text Omni-Modality...2023-05-29Code
3VALOR73.9YesVALOR: Vision-Audio-Language Omni-Perception Pre...2023-04-17Code