Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image-to-Text Retrieval
/
Flickr30k
Image-to-Text Retrieval on Flickr30k
Metric: Recall@5 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Recall@5
▼
Extra Data
Paper
Date
↕
Code
1
InternVL-G-FT (finetuned, w/o ranking)
100
No
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
2
BLIP-2 ViT-G (zero-shot, 1K test set)
100
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
3
ONE-PEACE (finetuned, w/o ranking)
100
No
ONE-PEACE: Exploring One General Representation ...
2023-05-18
Code
4
InternVL-C-FT (finetuned, w/o ranking)
100
No
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
5
BLIP-2 ViT-L (zero-shot, 1K test set)
100
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
6
ERNIE-ViL 2.0
99.9
No
ERNIE-ViL 2.0: Multi-view Contrastive Learning f...
2022-09-30
Code
7
ALBEF
99.8
No
Align before Fuse: Vision and Language Represent...
2021-07-16
Code
8
ALBEF
99.3
No
HADA: A Graph-based Amalgamation Framework in Im...
2023-01-11
Code
9
UNITER
98
No
HADA: A Graph-based Amalgamation Framework in Im...
2023-01-11
Code
10
GSMN
94.3
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code
11
LGSGM
91.9
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code