TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Cross-Modal Retrieval

Cross-Modal Retrieval

60 benchmarks522 papers

Cross-Modal Retrieval (CMR) is a task of retrieving items across different modalities, such as image, text, video, and audio. The core challenge of CMR is the heterogeneity gap, which arises because data from different modalities have distinct representations, making direct comparison difficult. To address this, most CMR methods focus on learning a shared latent embedding space. In this space, concepts from different modalities are projected, allowing their similarity to be measured using a distance metric.

<span class="description-source">Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study</span>

Benchmarks

Cross-Modal Retrieval on COCO 2014

Text-to-image R@1Text-to-image R@5Text-to-image R@10Image-to-text R@1Image-to-text R@5Image-to-text R@10

Cross-Modal Retrieval on Flickr30k

Text-to-image R@1Image-to-text R@1Text-to-image R@5Text-to-image R@10Image-to-text R@5Image-to-text R@10

Cross-Modal Retrieval on COCO-Noisy

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Retrieval on Flickr30K-Noisy

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Retrieval on CC152K

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Retrieval on ChEBI-20

Hits@1Hits@10Mean RankTest MRR

Cross-Modal Retrieval on Recipe1M

Image-to-text R@1Text-to-image R@1

Cross-Modal Retrieval on CommercialAdsDataset

ADD(S) AUC

Cross-Modal Retrieval on ITCPR dataset

Rank-1mAP

Cross-Modal Retrieval on MSCOCO-1k

Image-to-text R@1Text-to-image R@1

Cross-Modal Retrieval on Recipe1M+

Image-to-text R@1Text-to-image R@1

Cross-Modal Retrieval on SoundingEarth

Median RankImage-to-sound R@100Sound-to-image R@100

Cross-Modal Retrieval on CUHK-PEDES

Text-to-image Medr

Cross-Modal Retrieval on Flickr-8k

Image-to-text R@1Text-to-image R@1

Cross-Modal Retrieval on MS-COCO-2014

Text-to-image R@1

Cross-Modal Retrieval on MSCOCO

Image-to-text R@1

Cross-Modal Retrieval on RSICD

Mean RecallImage-to-text R@1text-to-image R@1

Cross-Modal Retrieval on RSITMD

Image-to-text R@1Mean Recalltext-to-imageR@1