TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Image-to-Text Retrieval/Flickr30k

Image-to-Text Retrieval on Flickr30k

Metric: Recall@1 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Recall@1▼Extra DataPaperDate↕Code
1InternVL-G-FT (finetuned, w/o ranking)97.9NoInternVL: Scaling up Vision Foundation Models an...2023-12-21Code
2BLIP-2 ViT-G (zero-shot, 1K test set)97.6NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
3ONE-PEACE (finetuned, w/o ranking)97.6NoONE-PEACE: Exploring One General Representation ...2023-05-18Code
4InternVL-C-FT (finetuned, w/o ranking)97.2NoInternVL: Scaling up Vision Foundation Models an...2023-12-21Code
5BLIP-2 ViT-L (zero-shot, 1K test set)96.9NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
6ERNIE-ViL 2.096.1NoERNIE-ViL 2.0: Multi-view Contrastive Learning f...2022-09-30Code
7ALBEF95.9NoAlign before Fuse: Vision and Language Represent...2021-07-16Code
8ALBEF92.6NoHADA: A Graph-based Amalgamation Framework in Im...2023-01-11Code
9UNITER87.3NoHADA: A Graph-based Amalgamation Framework in Im...2023-01-11Code
10GSMN76.4NoA Deep Local and Global Scene-Graph Matching for...2021-06-04Code
11LGSGM71NoA Deep Local and Global Scene-Graph Matching for...2021-06-04Code