Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Retrieval
/
Flickr30K 1K test
Image Retrieval on Flickr30K 1K test
Metric: R@10 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
R@10
▼
Extra Data
Paper
Date
↕
Code
1
X-VLM (base)
98.7
Yes
Multi-Grained Vision Language Pre-Training: Alig...
2021-11-16
Code
2
RCAR
91.1
No
Plug-and-Play Regulators for Image-Text Matching
2023-03-23
Code
3
LGSGM
90.2
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code
4
TERAN Symm.
89.3
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
5
SGRAF
88.8
No
Similarity Reasoning and Filtration for Image-Te...
2021-01-05
Code
6
TERAN MrSw
88.2
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
7
VSRN
88.2
No
Visual Semantic Reasoning for Image-Text Matching
2019-09-06
Code
8
VisualSparta
88.1
No
VisualSparta: An Embarrassingly Simple Approach ...
2021-01-01
Code
9
CAMP
85.3
No
CAMP: Cross-Modal Adaptive Message Passing for T...
2019-09-12
Code
10
SCAN i-t
82.6
No
Stacked Cross Attention for Image-Text Matching
2018-03-21
Code
11
SCO
80.1
No
Learning Semantic Concepts and Order for Image a...
2017-12-06
-
12
DAN
79.1
No
Dual Attention Networks for Multimodal Reasoning...
2016-11-02
Code
13
SM-LSTM (VGG)
72.3
No
Instance-aware Image and Sentence Matching with ...
2016-11-17
-
14
SPE
72.1
No
Learning Deep Structure-Preserving Image-Text Em...
2015-11-19
-
15
mCNN
69.6
No
Multimodal Convolutional Neural Networks for Mat...
2015-04-23
Code
16
HGLMM FV
66.8
No
Flickr30k Entities: Collecting Region-to-Phrase ...
2015-05-19
Code
17
DVSA (R-CNN, AlexNet)
50.5
No
Deep Visual-Semantic Alignments for Generating I...
2014-12-07
Code