10-shot image generation on COCO minival

Metric: PQ (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

#	Model↕	PQ▼	Extra Data	Paper	Date↕	Code
1	HyperSeg (Swin-B)	61.2	Yes	HyperSeg: Towards Universal Visual Segmentation ...	2024-11-26	Code
2	OneFormer (InternImage-H,single-scale)	60	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
3	OpenSeeD (SwinL, single-scale)	59.5	Yes	A Simple Framework for Open-Vocabulary Segmentat...	2023-03-14	Code
4	UMG-CLIP-E/14	59.5	Yes	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
5	MasK DINO (SwinL,single-scale)	59.4	Yes	Mask DINO: Towards A Unified Transformer-based F...	2022-06-06	Code
6	EoMT (DINOv2-g, single-scale, 1280x1280)	59.2	No	Your ViT is Secretly an Image Segmentation Model	2025-03-24	Code
7	UMG-CLIP-L/14	58.9	Yes	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
8	DiNAT-L (single-scale, Mask2Former)	58.5	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
9	ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)	58.4	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
10	Visual Attention Network (VAN-B6 + Mask2Former)	58.2	No	Visual Attention Network	2022-02-20	Code
11	kMaX-DeepLab (single-scale, pseudo-labels)	58.1	Yes	kMaX-DeepLab: k-means Mask Transformer	2022-07-08	Code
12	HIPIE (ViT-H, single-scale)	58.1	Yes	Hierarchical Open-vocabulary Universal Image Seg...	2023-07-03	Code
13	kMaX-DeepLab (single-scale, drop query with 256 queries)	58	No	kMaX-DeepLab: k-means Mask Transformer	2022-07-08	Code
14	OneFormer (DiNAT-L, single-scale)	58	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
15	kMaX-DeepLab (single-scale)	57.9	No	kMaX-DeepLab: k-means Mask Transformer	2022-07-08	Code
16	OneFormer (Swin-L, single-scale)	57.9	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
17	FocalNet-L (Mask2Former (200 queries))	57.9	No	Focal Modulation Networks	2022-03-22	Code
18	Mask2Former (single-scale)	57.8	No	Masked-attention Mask Transformer for Universal ...	2021-12-02	Code
19	Panoptic SegFormer (single-scale)	55.8	No	Panoptic SegFormer: Delving Deeper into Panoptic...	2021-09-08	Code
20	CMT-DeepLab (single-scale)	55.3	No	CMT-DeepLab: Clustering Mask Transformers for Pa...	2022-06-17	Code
21	MaskFormer (single-scale)	52.7	No	Per-Pixel Classification is Not All You Need for...	2021-07-13	Code
22	MaX-DeepLab-L (single-scale)	51.1	No	MaX-DeepLab: End-to-End Panoptic Segmentation wi...	2020-12-01	Code
23	Panoptic SegFormer (ResNet-101)	50.6	No	Panoptic SegFormer: Delving Deeper into Panoptic...	2021-09-08	Code
24	PanopticFPN+ResNeSt(single-scale)	47.9	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
25	DETR-R101 (ResNet-101)	45.1	No	End-to-End Object Detection with Transformers	2020-05-26	Code
26	Panoptic FCN* (ResNet-50-FPN)	44.3	No	Fully Convolutional Networks for Panoptic Segmen...	2020-12-01	Code
27	PanopticFPN++	44.1	No	End-to-End Object Detection with Transformers	2020-05-26	Code
28	Axial-DeepLab-L (multi-scale)	43.9	No	Axial-DeepLab: Stand-Alone Axial-Attention for P...	2020-03-17	Code
29	Axial-DeepLab-L (single-scale)	43.4	No	Axial-DeepLab: Stand-Alone Axial-Attention for P...	2020-03-17	Code