10-shot image generation on ADE20K

Metric: Params (M) (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	Params (M)▼	Extra Data	Paper	Date↕	Code
1	FD-SwinV2-G	3000	No	Contrastive Learning Rivals Masked Image Modelin...	2022-05-27	Code
2	RevCol-H (Mask2Former)	2439	Yes	Reversible Column Networks	2022-12-22	Code
3	BEiT-3	1900	Yes	Image as a Foreign Language: BEiT Pretraining fo...	2022-08-22	Code
4	ViT-P (InternImage-H)	1610	Yes	The Missing Point in Vision Transformers for Uni...	2025-05-26	Code
5	ONE-PEACE	1500	Yes	ONE-PEACE: Exploring One General Representation ...	2023-05-18	Code
6	ViT-P (OneFormer, InternImage-H)	1400	No	The Missing Point in Vision Transformers for Uni...	2025-05-26	Code
7	InternImage-H	1310	Yes	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
8	M3I Pre-training (InternImage-H)	1310	Yes	Towards All-in-one Pre-training via Maximizing M...	2022-11-17	Code
9	InternImage-H (M3I Pre-training)	1310	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
10	DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)	1080	No	DINOv2: Learning Robust Visual Features without ...	2023-04-14	Code
11	EVA	1074	Yes	EVA: Exploring the Limits of Masked Visual Repre...	2022-11-14	Code
12	ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)	571	Yes	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
13	ViT-Adapter-L (Mask2Former, BEiT pretrain)	571	Yes	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
14	MOAT-4 (IN-22K pretraining, single-scale)	496	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
15	ViT-Adapter-L (UperNet, BEiT pretrain)	451	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
16	ConvNeXt-XL++	391	No	A ConvNet for the 2020s	2022-01-10	Code
17	InternImage-XL	368	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
18	RSSeg-ViT-L (BEiT pretrain)	330	No	Representation Separation for Semantic Segmentat...	2022-12-28	-
19	EoMT (DINOv2-L, single-scale, 512x512)	316	No	Your ViT is Secretly an Image Segmentation Model	2025-03-24	Code
20	ViT-P (OneFormer, DiNAT-L)	309	No	The Missing Point in Vision Transformers for Uni...	2025-05-26	Code
21	InternImage-L	256	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
22	ConvNeXt-L++	235	No	A ConvNet for the 2020s	2022-01-10	Code
23	MasK DINO (SwinL, multi-scale)	223	Yes	Mask DINO: Towards A Unified Transformer-based F...	2022-06-06	Code
24	Sequential Ensemble (SegFormer)	216.3	No	Sequential Ensembling for Semantic Segmentation	2022-10-08	-
25	LV-ViT-L (UperNet, MS)	209	No	All Tokens Matter: Token Labeling for Training B...	2021-04-22	Code
26	DDP (Swin-L, step-3)	207	No	DDP: Diffusion Model for Dense Visual Prediction	2023-03-30	Code
27	MOAT-3 (IN-22K pretraining, single-scale)	198	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
28	InternImage-B	128	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
29	GC ViT-B	125	No	Global Context Vision Transformers	2022-06-20	Code
30	NAT-Base	123	No	Neighborhood Attention Transformer	2022-04-14	Code
31	ConvNeXt-B++	122	No	A ConvNet for the 2020s	2022-01-10	Code
32	ConvNeXt-B	122	No	A ConvNet for the 2020s	2022-01-10	Code
33	DAT-B (UperNet)	121	No	Vision Transformer with Deformable Attention	2022-01-03	Code
34	TransNeXt-Base (IN-1K pretrain, Mask2Former, 512)	109	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
35	ActiveMLP-L(UperNet)	108	No	Active Token Mixer	2022-03-11	Code
36	SeMask (SeMask Swin-B FPN)	96	No	SeMask: Semantically Masked Transformers for Sem...	2021-12-23	Code
37	SegFormer-B5	84.7	Yes	SegFormer: Simple and Efficient Design for Seman...	2021-05-31	Code
38	GC ViT-S	84	No	Global Context Vision Transformers	2022-06-20	Code
39	ConvNeXt-S	82	No	A ConvNet for the 2020s	2022-01-10	Code
40	NAT-Small	82	No	Neighborhood Attention Transformer	2022-04-14	Code
41	MOAT-2 (IN-22K pretraining, single-scale)	81	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
42	DAT-S (UperNet)	81	No	Vision Transformer with Deformable Attention	2022-01-03	Code
43	InternImage-S	80	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
44	TransNeXt-Small (IN-1K pretrain, Mask2Former, 512)	69	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
45	SegFormer-B4	64.1	Yes	SegFormer: Simple and Efficient Design for Seman...	2021-05-31	Code
46	Light-Ham (VAN-Huge)	61.1	No	Is Attention Better Than Matrix Decomposition?	2021-09-09	Code
47	ConvNeXt-T	60	No	A ConvNet for the 2020s	2022-01-10	Code
48	DAT-T (UperNet)	60	No	Vision Transformer with Deformable Attention	2022-01-03	Code
49	InternImage-T	59	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
50	NAT-Tiny	58	No	Neighborhood Attention Transformer	2022-04-14	Code
51	GC ViT-T	58	No	Global Context Vision Transformers	2022-06-20	Code
52	SeMask (SeMask Swin-S FPN)	56	No	SeMask: Semantically Masked Transformers for Sem...	2021-12-23	Code
53	VAN-Large (HamNet)	55	No	Visual Attention Network	2022-02-20	Code
54	NAT-Mini	50	No	Neighborhood Attention Transformer	2022-04-14	Code
55	VAN-Large	49	No	Visual Attention Network	2022-02-20	Code
56	TransNeXt-Tiny (IN-1K pretrain, Mask2Former, 512)	47.5	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
57	Light-Ham (VAN-Large)	45.6	No	Is Attention Better Than Matrix Decomposition?	2021-09-09	Code
58	SeMask (SeMask Swin-T FPN)	35	No	SeMask: Semantically Masked Transformers for Sem...	2021-12-23	Code
59	HRViT-b3 (SegFormer, SS)	28.7	No	Multi-Scale High-Resolution Vision Transformer f...	2021-11-01	Code
60	Light-Ham (VAN-Base)	27.4	No	Is Attention Better Than Matrix Decomposition?	2021-09-09	Code
61	tiny-MOAT-3 (IN-1K pretraining, single scale)	24	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
62	HRViT-b2 (SegFormer, SS)	20.8	No	Multi-Scale High-Resolution Vision Transformer f...	2021-11-01	Code
63	VAN-Small	18	No	Visual Attention Network	2022-02-20	Code
64	Light-Ham (VAN-Small, D=256)	13.8	No	Is Attention Better Than Matrix Decomposition?	2021-09-09	Code
65	tiny-MOAT-2 (IN-1K pretraining, single scale)	13	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
66	HRViT-b1 (SegFormer, SS)	8.2	No	Multi-Scale High-Resolution Vision Transformer f...	2021-11-01	Code
67	tiny-MOAT-1 (IN-1K pretraining, single scale)	8	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
68	VAN-Tiny	8	No	Visual Attention Network	2022-02-20	Code
69	tiny-MOAT-0 (IN-1K pretraining, single scale)	6	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
70	SegFormer-B0	3.8	Yes	SegFormer: Simple and Efficient Design for Seman...	2021-05-31	Code