Image Retrieval with Multi-Modal Query on SoundingEarth

Metric: Image-to-sound R@100 (higher is better)

LeaderboardDataset
Loading chart...
#ModelImage-to-sound R@100Extra DataPaperDateCode
1GeoCLAP0.434YesLearning Tri-modal Embeddings for Zero-Shot Soun...2023-09-19Code
2ResNet-180.291NoSelf-supervised Audiovisual Representation Learn...2021-08-02Code