Image Retrieval on Flickr30k-CN

Metric: R@1 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	R@1▼	Extra Data	Paper	Date↕	Code
1	InternVL-G-FT	85.9	No	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
2	InternVL-C-FT	85.2	No	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
3	CN-CLIP (ViT-L/14@336px)	84.4	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
4	R2D2 (ViT-L/14)	84.4	No	CCMB: A Large-scale Chinese Cross-modal Benchmark	2022-05-08	Code
5	CN-CLIP (ViT-H/14)	83.8	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
6	CN-CLIP (ViT-L/14)	82.7	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
7	CN-CLIP (ViT-B/16)	79.1	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
8	R2D2 (ViT-B)	78.3	No	CCMB: A Large-scale Chinese Cross-modal Benchmark	2022-05-08	Code
9	Wukong (ViT-L/14)	77.4	No	Wukong: A 100 Million Large-scale Chinese Cross-...	2022-02-14	Code
10	Wukong (ViT-B/32)	67.6	No	Wukong: A 100 Million Large-scale Chinese Cross-...	2022-02-14	Code
11	CN-CLIP (RN50)	66.7	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code

#1InternVL-G-FTSOTA
85.9
R@1· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#2InternVL-C-FT
85.2
R@1· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#3CN-CLIP (ViT-L/14@336px)
84.4
R@1· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#4R2D2 (ViT-L/14)SOTA
84.4
R@1· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark Code
#5CN-CLIP (ViT-H/14)
83.8
R@1· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#6CN-CLIP (ViT-L/14)
82.7
R@1· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#7CN-CLIP (ViT-B/16)
79.1
R@1· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#8R2D2 (ViT-B)
78.3
R@1· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark Code
#9Wukong (ViT-L/14)SOTA
77.4
R@1· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark Code
#10Wukong (ViT-B/32)
67.6
R@1· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark Code
#11CN-CLIP (RN50)
66.7
R@1· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code