Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Image Retrieval
/
Flickr30k-CN
Image Retrieval on Flickr30k-CN
Metric: R@5 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
R@5 (best first)
R@5 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
R@5
▼
Extra Data
Paper
Date
↕
Code
1
InternVL-G-FT
98.7
No
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
2
InternVL-C-FT
98.5
No
InternVL: Scaling up Vision Foundation Models an...
2023-12-21
Code
3
CN-CLIP (ViT-L/14@336px)
97.1
No
Chinese CLIP: Contrastive Vision-Language Pretra...
2022-11-02
Code
4
CN-CLIP (ViT-H/14)
96.9
No
Chinese CLIP: Contrastive Vision-Language Pretra...
2022-11-02
Code
5
R2D2 (ViT-L/14)
96.7
No
CCMB: A Large-scale Chinese Cross-modal Benchmark
2022-05-08
Code
6
CN-CLIP (ViT-L/14)
96.7
No
Chinese CLIP: Contrastive Vision-Language Pretra...
2022-11-02
Code
7
CN-CLIP (ViT-B/16)
94.8
No
Chinese CLIP: Contrastive Vision-Language Pretra...
2022-11-02
Code
8
R2D2 (ViT-B)
94.6
No
CCMB: A Large-scale Chinese Cross-modal Benchmark
2022-05-08
Code
9
Wukong (ViT-L/14)
94.5
No
Wukong: A 100 Million Large-scale Chinese Cross-...
2022-02-14
Code
10
Wukong (ViT-B/32)
89.6
No
Wukong: A 100 Million Large-scale Chinese Cross-...
2022-02-14
Code
11
CN-CLIP (RN50)
89.4
No
Chinese CLIP: Contrastive Vision-Language Pretra...
2022-11-02
Code
#1
InternVL-G-FT
SOTA
98.7
R@5
· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Code
#2
InternVL-C-FT
98.5
R@5
· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Code
#3
CN-CLIP (ViT-L/14@336px)
SOTA
97.1
R@5
· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Code
#4
CN-CLIP (ViT-H/14)
96.9
R@5
· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Code
#5
R2D2 (ViT-L/14)
SOTA
96.7
R@5
· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark
Code
#6
CN-CLIP (ViT-L/14)
96.7
R@5
· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Code
#7
CN-CLIP (ViT-B/16)
94.8
R@5
· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Code
#8
R2D2 (ViT-B)
94.6
R@5
· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark
Code
#9
Wukong (ViT-L/14)
SOTA
94.5
R@5
· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Code
#10
Wukong (ViT-B/32)
89.6
R@5
· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Code
#11
CN-CLIP (RN50)
89.4
R@5
· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Code