Image Retrieval on CREPE (Compositional REPresentation Evaluation)

Metric: Recall@1 (HN-Atom, UC) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Recall@1 (HN-Atom, UC)▼	Extra Data	Paper	Date↕	Code
1	ViT-L-14 (LAION400M)	47.86	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
2	ViT-B-16+240 (LAION400M)	46.53	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
3	ViT-B-16 (LAION400M)	44.93	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
4	Swin-T (MosaiCLIP, CC-12M)	44.5	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
5	RN-50 (MosaiCLIP, CC-12M)	44.4	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
6	ViT-B-32 (LAION400M)	42.75	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
7	MosaiCLIP (YFCC-FT)	41.5	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
8	RN-50 (NegCLIP, CC-12M)	41.4	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
9	MosaiCLIP (CC-FT)	40.9	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
10	RN50 (YFCC15M)	39.85	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
11	Swin-T (NegCLIP, CC-12M)	39.6	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
12	RN101 (YFCC15M)	39.5	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
13	CLIP (YFCC-FT)	39.5	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
14	NegCLIP (YFCC-FT)	39	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
15	CLIP-FT (YFCC-FT)	38.3	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
16	NegCLIP (CC-FT)	37.5	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
17	Swin-T (CLIP, CC-12M)	37.3	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
18	RN-50 (CLIP, CC-12M)	36.7	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
19	CLIP-FT (CC-FT)	35.6	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
20	CLIP (CC-FT)	35	No	Coarse-to-Fine Contrastive Learning in Image-Tex...	2023-05-23	-
21	RN50 (CC12M)	34.88	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code
22	Random	20	No	CREPE: Can Vision-Language Foundation Models Rea...	2022-12-13	Code

#1ViT-L-14 (LAION400M)SOTA
47.86
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#2ViT-B-16+240 (LAION400M)
46.53
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#3ViT-B-16 (LAION400M)
44.93
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#4Swin-T (MosaiCLIP, CC-12M)
44.5
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#5RN-50 (MosaiCLIP, CC-12M)
44.4
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#6ViT-B-32 (LAION400M)
42.75
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#7MosaiCLIP (YFCC-FT)
41.5
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#8RN-50 (NegCLIP, CC-12M)
41.4
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#9MosaiCLIP (CC-FT)
40.9
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#10RN50 (YFCC15M)
39.85
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#11Swin-T (NegCLIP, CC-12M)
39.6
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#12RN101 (YFCC15M)
39.5
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#13CLIP (YFCC-FT)
39.5
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#14NegCLIP (YFCC-FT)
39
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#15CLIP-FT (YFCC-FT)
38.3
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#16NegCLIP (CC-FT)
37.5
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#17Swin-T (CLIP, CC-12M)
37.3
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#18RN-50 (CLIP, CC-12M)
36.7
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#19CLIP-FT (CC-FT)
35.6
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#20CLIP (CC-FT)
35
Recall@1 (HN-Atom, UC)· 2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
#21RN50 (CC12M)
34.88
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code
#22Random
20
Recall@1 (HN-Atom, UC)· 2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Code