Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Retrieval
/
Flickr30K 1K test
Image Retrieval on Flickr30K 1K test
Metric: R@10 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
R@10 (best first)
R@10 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
R@10
▼
Extra Data
Paper
Date
↕
Code
1
X-VLM (base)
98.7
Yes
Multi-Grained Vision Language Pre-Training: Alig...
2021-11-16
Code
2
RCAR
91.1
No
Plug-and-Play Regulators for Image-Text Matching
2023-03-23
Code
3
LGSGM
90.2
No
A Deep Local and Global Scene-Graph Matching for...
2021-06-04
Code
4
TERAN Symm.
89.3
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
5
SGRAF
88.8
No
Similarity Reasoning and Filtration for Image-Te...
2021-01-05
Code
6
TERAN MrSw
88.2
No
Fine-grained Visual Textual Alignment for Cross-...
2020-08-12
Code
7
VSRN
88.2
No
Visual Semantic Reasoning for Image-Text Matching
2019-09-06
Code
8
VisualSparta
88.1
No
VisualSparta: An Embarrassingly Simple Approach ...
2021-01-01
Code
9
CAMP
85.3
No
CAMP: Cross-Modal Adaptive Message Passing for T...
2019-09-12
Code
10
SCAN i-t
82.6
No
Stacked Cross Attention for Image-Text Matching
2018-03-21
Code
11
SCO
80.1
No
Learning Semantic Concepts and Order for Image a...
2017-12-06
-
12
DAN
79.1
No
Dual Attention Networks for Multimodal Reasoning...
2016-11-02
Code
13
SM-LSTM (VGG)
72.3
No
Instance-aware Image and Sentence Matching with ...
2016-11-17
-
14
SPE
72.1
No
Learning Deep Structure-Preserving Image-Text Em...
2015-11-19
-
15
mCNN
69.6
No
Multimodal Convolutional Neural Networks for Mat...
2015-04-23
Code
16
HGLMM FV
66.8
No
Flickr30k Entities: Collecting Region-to-Phrase ...
2015-05-19
Code
17
DVSA (R-CNN, AlexNet)
50.5
No
Deep Visual-Semantic Alignments for Generating I...
2014-12-07
Code
#1
X-VLM (base)
SOTA
98.7
R@10
· Extra Data
· 2021-11-16
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Code
#2
RCAR
91.1
R@10
· 2023-03-23
Plug-and-Play Regulators for Image-Text Matching
Code
#3
LGSGM
SOTA
90.2
R@10
· 2021-06-04
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
Code
#4
TERAN Symm.
SOTA
89.3
R@10
· 2020-08-12
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
Code
#5
SGRAF
88.8
R@10
· 2021-01-05
Similarity Reasoning and Filtration for Image-Text Matching
Code
#6
TERAN MrSw
88.2
R@10
· 2020-08-12
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders
Code
#7
VSRN
SOTA
88.2
R@10
· 2019-09-06
Visual Semantic Reasoning for Image-Text Matching
Code
#8
VisualSparta
88.1
R@10
· 2021-01-01
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
Code
#9
CAMP
85.3
R@10
· 2019-09-12
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Code
#10
SCAN i-t
SOTA
82.6
R@10
· 2018-03-21
Stacked Cross Attention for Image-Text Matching
Code
#11
SCO
SOTA
80.1
R@10
· 2017-12-06
Learning Semantic Concepts and Order for Image and Sentence Matching
#12
DAN
SOTA
79.1
R@10
· 2016-11-02
Dual Attention Networks for Multimodal Reasoning and Matching
Code
#13
SM-LSTM (VGG)
72.3
R@10
· 2016-11-17
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
#14
SPE
SOTA
72.1
R@10
· 2015-11-19
Learning Deep Structure-Preserving Image-Text Embeddings
#15
mCNN
SOTA
69.6
R@10
· 2015-04-23
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Code
#16
HGLMM FV
66.8
R@10
· 2015-05-19
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Code
#17
DVSA (R-CNN, AlexNet)
SOTA
50.5
R@10
· 2014-12-07
Deep Visual-Semantic Alignments for Generating Image Descriptions
Code