TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Miscellaneous/Cross-Modal Information Retrieval

Cross-Modal Information Retrieval

60 benchmarks16 papers

Cross-Modal Information Retrieval (CMIR) is the task of finding relevant items across different modalities. For example, given an image, find a text or vice versa. The main challenge in CMIR is known as the heterogeneity gap: since items from different modalities have different data types, the similarity between them cannot be measured directly. Therefore, the majority of CMIR methods published to date attempt to bridge this gap by learning a latent representation space, where the similarity between items from different modalities can be measured.

<span class="description-source">Source: Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study</span>

Benchmarks

Cross-Modal Information Retrieval on COCO 2014

Text-to-image R@1Text-to-image R@5Text-to-image R@10Image-to-text R@1Image-to-text R@5Image-to-text R@10

Cross-Modal Information Retrieval on Flickr30k

Text-to-image R@1Image-to-text R@1Text-to-image R@5Text-to-image R@10Image-to-text R@5Image-to-text R@10

Cross-Modal Information Retrieval on COCO-Noisy

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Information Retrieval on Flickr30K-Noisy

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Information Retrieval on CC152K

R-SumImage-to-text R@1Image-to-text R@5Image-to-text R@10Text-to-image R@1Text-to-image R@5Text-to-image R@10

Cross-Modal Information Retrieval on ChEBI-20

Hits@1Hits@10Mean RankTest MRR

Cross-Modal Information Retrieval on Recipe1M

Image-to-text R@1Text-to-image R@1

Cross-Modal Information Retrieval on CommercialAdsDataset

ADD(S) AUC

Cross-Modal Information Retrieval on ITCPR dataset

Rank-1mAP

Cross-Modal Information Retrieval on MSCOCO-1k

Image-to-text R@1Text-to-image R@1

Cross-Modal Information Retrieval on Recipe1M+

Image-to-text R@1Text-to-image R@1

Cross-Modal Information Retrieval on SoundingEarth

Median RankImage-to-sound R@100Sound-to-image R@100

Cross-Modal Information Retrieval on CUHK-PEDES

Text-to-image Medr

Cross-Modal Information Retrieval on Flickr-8k

Image-to-text R@1Text-to-image R@1

Cross-Modal Information Retrieval on MS-COCO-2014

Text-to-image R@1

Cross-Modal Information Retrieval on MSCOCO

Image-to-text R@1

Cross-Modal Information Retrieval on RSICD

Mean RecallImage-to-text R@1text-to-image R@1

Cross-Modal Information Retrieval on RSITMD

Image-to-text R@1Mean Recalltext-to-imageR@1