Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Retrieval
/
Flickr30K 1K test
Image Retrieval on Flickr30K 1K test
Metric: R@1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
R@1
▼
Extra Data
Paper
Date
↕
Code
1
X-VLM (base)
86.9
Yes
Multi-Grained Vision Language Pre-Training: Alig...
2021-11-16
Code
2
RCAR
62.6
No
Plug-and-Play Regulators for Image-Text Matching
2023-03-23
Code
3
SGRAF
58.5
No
Similarity Reasoning and Filtration for Image-Te...
2021-01-05
Code
4
LGSGM
57.4
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code
5
VisualSparta
57.4
No
VisualSparta: An Embarrassingly Simple Approach ...
2021-01-01
Code
6
TERAN MrSw
56.5
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
7
TERAN Symm.
55.7
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
8
VSRN
54.7
No
Visual Semantic Reasoning for Image-Text Matching
2019-09-06
Code
9
CAMP
51.5
No
CAMP: Cross-Modal Adaptive Message Passing for T...
2019-09-12
Code
10
SCAN i-t
44
No
Stacked Cross Attention for Image-Text Matching
2018-03-21
Code
11
SCO
41.1
No
Learning Semantic Concepts and Order for Image a...
2017-12-06
-
12
DAN
39.4
No
Dual Attention Networks for Multimodal Reasoning...
2016-11-02
Code
13
2WayNet (VGG)
36
No
Linking Image and Text with 2-Way Nets
2016-08-29
Code
14
SM-LSTM (VGG)
30.2
No
Instance-aware Image and Sentence Matching with ...
2016-11-17
-
15
SPE
29.7
No
Learning Deep Structure-Preserving Image-Text Em...
2015-11-19
-
16
mCNN
26.2
No
Multimodal Convolutional Neural Networks for Mat...
2015-04-23
Code
17
HGLMM FV
24.7
No
Flickr30k Entities: Collecting Region-to-Phrase ...
2015-05-19
Code
18
DVSA (R-CNN, AlexNet)
15.2
No
Deep Visual-Semantic Alignments for Generating I...
2014-12-07
Code