Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Retrieval
/
Flickr30K 1K test
Image Retrieval on Flickr30K 1K test
Metric: R@1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
R@1 (best first)
R@1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
R@1
▼
Extra Data
Paper
Date
↕
Code
1
X-VLM (base)
86.9
Yes
Multi-Grained Vision Language Pre-Training: Alig...
2021-11-16
Code
2
RCAR
62.6
No
Plug-and-Play Regulators for Image-Text Matching
2023-03-23
Code
3
SGRAF
58.5
No
Similarity Reasoning and Filtration for Image-Te...
2021-01-05
Code
4
LGSGM
57.4
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code
5
VisualSparta
57.4
No
VisualSparta: An Embarrassingly Simple Approach ...
2021-01-01
Code
6
TERAN MrSw
56.5
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
7
TERAN Symm.
55.7
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
8
VSRN
54.7
No
Visual Semantic Reasoning for Image-Text Matching
2019-09-06
Code
9
CAMP
51.5
No
CAMP: Cross-Modal Adaptive Message Passing for T...
2019-09-12
Code
10
SCAN i-t
44
No
Stacked Cross Attention for Image-Text Matching
2018-03-21
Code
11
SCO
41.1
No
Learning Semantic Concepts and Order for Image a...
2017-12-06
-
12
DAN
39.4
No
Dual Attention Networks for Multimodal Reasoning...
2016-11-02
Code
13
2WayNet (VGG)
36
No
Linking Image and Text with 2-Way Nets
2016-08-29
Code
14
SM-LSTM (VGG)
30.2
No
Instance-aware Image and Sentence Matching with ...
2016-11-17
-
15
SPE
29.7
No
Learning Deep Structure-Preserving Image-Text Em...
2015-11-19
-
16
mCNN
26.2
No
Multimodal Convolutional Neural Networks for Mat...
2015-04-23
Code
17
HGLMM FV
24.7
No
Flickr30k Entities: Collecting Region-to-Phrase ...
2015-05-19
Code
18
DVSA (R-CNN, AlexNet)
15.2
No
Deep Visual-Semantic Alignments for Generating I...
2014-12-07
Code
#1
X-VLM (base)
SOTA
86.9
R@1
· Extra Data
· 2021-11-16
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Code
#2
RCAR
62.6
R@1
· 2023-03-23
Plug-and-Play Regulators for Image-Text Matching
Code
#3
SGRAF
SOTA
58.5
R@1
· 2021-01-05
Similarity Reasoning and Filtration for Image-Text Matching
Code
#4
LGSGM
57.4
R@1
· 2021-06-04
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
Code
#5
VisualSparta
SOTA
57.4
R@1
· 2021-01-01
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
Code
#6
TERAN MrSw
SOTA
56.5
R@1
· 2020-08-12
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
Code
#7
TERAN Symm.
55.7
R@1
· 2020-08-12
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
Code
#8
VSRN
SOTA
54.7
R@1
· 2019-09-06
Visual Semantic Reasoning for Image-Text Matching
Code
#9
CAMP
51.5
R@1
· 2019-09-12
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Code
#10
SCAN i-t
SOTA
44
R@1
· 2018-03-21
Stacked Cross Attention for Image-Text Matching
Code
#11
SCO
SOTA
41.1
R@1
· 2017-12-06
Learning Semantic Concepts and Order for Image and Sentence Matching
#12
DAN
SOTA
39.4
R@1
· 2016-11-02
Dual Attention Networks for Multimodal Reasoning and Matching
Code
#13
2WayNet (VGG)
SOTA
36
R@1
· 2016-08-29
Linking Image and Text with 2-Way Nets
Code
#14
SM-LSTM (VGG)
30.2
R@1
· 2016-11-17
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
#15
SPE
SOTA
29.7
R@1
· 2015-11-19
Learning Deep Structure-Preserving Image-Text Embeddings
#16
mCNN
SOTA
26.2
R@1
· 2015-04-23
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Code
#17
HGLMM FV
24.7
R@1
· 2015-05-19
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Code
#18
DVSA (R-CNN, AlexNet)
SOTA
15.2
R@1
· 2014-12-07
Deep Visual-Semantic Alignments for Generating Image Descriptions
Code