TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Miscellaneous/Cross-Modal Information Retrieval/Flickr30k

Cross-Modal Information Retrieval on Flickr30k

Metric: Image-to-text R@10 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Image-to-text R@10▼Extra DataPaperDate↕Code
1X2-VLM (large)100YesX$^2$-VLM: All-In-One Pre-trained Model For Visi...2022-11-22Code
2X2-VLM (base)100YesX$^2$-VLM: All-In-One Pre-trained Model For Visi...2022-11-22Code
3BEiT-3100YesImage as a Foreign Language: BEiT Pretraining fo...2022-08-22Code
4OmniVL (14M)100YesOmniVL:One Foundation Model for Image-Language a...2022-09-15-
5ERNIE-ViL 2.0100YesERNIE-ViL 2.0: Multi-view Contrastive Learning f...2022-09-30Code
6Aurora (ours, r=128)100Yes---
7X-VLM (base)100YesMulti-Grained Vision Language Pre-Training: Alig...2021-11-16Code
8VSE-Gradient100YesDissecting Deep Metric Learning Losses for Image...2022-10-21Code
9ALIGN100YesScaling Up Visual and Vision-Language Representa...2021-02-11Code
10ViSTA99.6YesViSTA: Vision and Scene Text Aggregation for Cro...2022-03-31-
11IAIS99.4YesLearning Relation Alignment for Calibrated Cross...2021-05-28Code
123SHNet99.2No3SHNet: Boosting Image-Sentence Retrieval via Vi...2024-04-26Code
13ViLT-B/3298.6YesViLT: Vision-and-Language Transformer Without Co...2021-02-05Code
14RCAR98.4NoPlug-and-Play Regulators for Image-Text Matching2023-03-23Code
15DSMD97.7NoDynamic Self-adaptive Multiscale Distillation fr...2024-04-16Code
16SGRAF97.4NoSimilarity Reasoning and Filtration for Image-Te...2021-01-05Code
17GSMN97.3NoA Deep Local and Global Scene-Graph Matching for...2021-06-04Code
18Pearl97.3No---
19IMRAM96.6NoIMRAM: Iterative Matching with Recurrent Attenti...2020-03-08Code
20SCAN95.8NoStacked Cross Attention for Image-Text Matching2018-03-21Code
21Dual-Path (ResNet)89.5NoDual-Path Convolutional Image-Text Embeddings wi...2017-11-15Code
22SCO (ResNet)89.3NoLearning Semantic Concepts and Order for Image a...2017-12-06-
23VSE++ (ResNet)87.2NoVSE++: Improving Visual-Semantic Embeddings with...2017-07-18Code
24CMPL (ResNet)86.1No--Code