Image Retrieval on CREPE (Compositional REPresentation Evaluation)

Metric: Recall@1 (HN-Comp, UC) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Recall@1 (HN-Comp, UC)▼	Extra Data	Paper	Date↕	Code
1	RN-50 (MosaiCLIP, CC-12M)	92.6	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
2	Swin-T (MosaiCLIP, CC-12M)	92.1	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
3	RN-50 (NegCLIP, CC-12M)	82	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
4	Swin-T (NegCLIP, CC-12M)	80.3	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
5	MosaiCLIP (CC-FT)	72.4	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
6	ViT-L-14 (LAION400M)	60.78	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
7	ViT-B-16+240 (LAION400M)	60.19	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
8	ViT-B-16 (LAION400M)	59	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
9	ViT-B-32 (LAION400M)	54.8	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
10	NegCLIP (CC-FT)	53.1	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
11	MosaiCLIP (YFCC-FT)	48.8	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
12	CLIP-FT (CC-FT)	45.8	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
13	RN50 (CC12M)	45.27	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
14	CLIP (CC-FT)	45.1	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
15	Swin-T (CLIP, CC-12M)	44.1	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
16	RN-50 (CLIP, CC-12M)	42.9	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
17	RN50 (YFCC15M)	39.83	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
18	CLIP (YFCC-FT)	39.8	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
19	RN101 (YFCC15M)	39.56	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
20	NegCLIP (YFCC-FT)	38.8	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
21	CLIP-FT (YFCC-FT)	36.4	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
22	Random	14.29	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code

#1RN-50 (MosaiCLIP, CC-12M)SOTA
92.6
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#2Swin-T (MosaiCLIP, CC-12M)
92.1
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#3RN-50 (NegCLIP, CC-12M)
82
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#4Swin-T (NegCLIP, CC-12M)
80.3
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#5MosaiCLIP (CC-FT)
72.4
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#6ViT-L-14 (LAION400M)SOTA
60.78
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#7ViT-B-16+240 (LAION400M)
60.19
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#8ViT-B-16 (LAION400M)
59
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#9ViT-B-32 (LAION400M)
54.8
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#10NegCLIP (CC-FT)
53.1
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#11MosaiCLIP (YFCC-FT)
48.8
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#12CLIP-FT (CC-FT)
45.8
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#13RN50 (CC12M)
45.27
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#14CLIP (CC-FT)
45.1
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#15Swin-T (CLIP, CC-12M)
44.1
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#16RN-50 (CLIP, CC-12M)
42.9
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#17RN50 (YFCC15M)
39.83
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#18CLIP (YFCC-FT)
39.8
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#19RN101 (YFCC15M)
39.56
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#20NegCLIP (YFCC-FT)
38.8
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#21CLIP-FT (YFCC-FT)
36.4
Recall@1 (HN-Comp, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#22Random
14.29
Recall@1 (HN-Comp, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code