Image Retrieval on Flickr30k-CN

Metric: R@10 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	R@10▼	Extra Data	Paper	Date↕	Code
1	CN-CLIP (ViT-L/14@336px)	98.7	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
2	CN-CLIP (ViT-H/14)	98.6	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
3	CN-CLIP (ViT-L/14)	98.6	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
4	R2D2 (ViT-L/14)	98.4	No	CCMB: A Large-scale Chinese Cross-modal Benchmark	2022-05-08	Code
5	CN-CLIP (ViT-B/16)	97.4	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code
6	InternVL-G-FT	97.1	No	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
7	InternVL-C-FT	97	No	InternVL: Scaling up Vision Foundation Models an...	2023-12-21	Code
8	R2D2 (ViT-B)	97	No	CCMB: A Large-scale Chinese Cross-modal Benchmark	2022-05-08	Code
9	Wukong (ViT-L/14)	97	No	Wukong: A 100 Million Large-scale Chinese Cross-...	2022-02-14	Code
10	Wukong (ViT-B/32)	94.2	No	Wukong: A 100 Million Large-scale Chinese Cross-...	2022-02-14	Code
11	CN-CLIP (RN50)	94.1	No	Chinese CLIP: Contrastive Vision-Language Pretra...	2022-11-02	Code

#1CN-CLIP (ViT-L/14@336px)SOTA
98.7
R@10· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#2CN-CLIP (ViT-H/14)
98.6
R@10· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#3CN-CLIP (ViT-L/14)
98.6
R@10· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#4R2D2 (ViT-L/14)SOTA
98.4
R@10· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark Code
#5CN-CLIP (ViT-B/16)
97.4
R@10· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code
#6InternVL-G-FT
97.1
R@10· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#7InternVL-C-FT
97
R@10· 2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Code
#8R2D2 (ViT-B)
97
R@10· 2022-05-08
CCMB: A Large-scale Chinese Cross-modal Benchmark Code
#9Wukong (ViT-L/14)SOTA
97
R@10· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark Code
#10Wukong (ViT-B/32)
94.2
R@10· 2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark Code
#11CN-CLIP (RN50)
94.1
R@10· 2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Code