Cross-Modal Retrieval on SoundingEarth

Metric: Sound-to-image R@100 (higher is better)

LeaderboardDataset
Loading chart...
#ModelSound-to-image R@100Extra DataPaperDateCode
1GeoCLAP0.434YesLearning Tri-modal Embeddings for Zero-Shot Soun...2023-09-19Code
2ResNet-180.25NoSelf-supervised Audiovisual Representation Learn...2021-08-02Code