10-shot image generation on COCO minival

Metric: AP (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	AP▼	Extra Data	Paper	Date↕	Code
1	OpenSeeD (SwinL, single-scale)	53.2	Yes	A Simple Framework for Open-Vocabulary Segmentat...	2023-03-14	Code
2	OneFormer (InternImage-H,single-scale)	52	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
3	MasK DINO (SwinL,single-scale)	50.9	Yes	Mask DINO: Towards A Unified Transformer-based F...	2022-06-06	Code
4	UMG-CLIP-E/14	50.7	Yes	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
5	UMG-CLIP-L/14	49.7	Yes	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
6	DiNAT-L (single-scale, Mask2Former)	49.2	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
7	OneFormer (DiNAT-L, single-scale)	49.2	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
8	OneFormer (Swin-L, single-scale)	49	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
9	ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)	48.9	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
10	Mask2Former (single-scale)	48.6	No	Masked-attention Mask Transformer for Universal ...	2021-12-02	Code
11	FocalNet-L (Mask2Former (200 queries))	48.4	No	Focal Modulation Networks	2022-03-22	Code
12	PanopticFPN++	39.7	No	End-to-End Object Detection with Transformers	2020-05-26	Code
13	DETR-R101 (ResNet-101)	33	No	End-to-End Object Detection with Transformers	2020-05-26	Code

#1OpenSeeD (SwinL, single-scale)SOTA
53.2
AP· Extra Data· 2023-03-14
A Simple Framework for Open-Vocabulary Segmentation and Detection Code
#2OneFormer (InternImage-H,single-scale)SOTA
52
AP· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#3MasK DINO (SwinL,single-scale)SOTA
50.9
AP· Extra Data· 2022-06-06
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation Code
#4UMG-CLIP-E/14
50.7
AP· Extra Data· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#5UMG-CLIP-L/14
49.7
AP· Extra Data· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#6DiNAT-L (single-scale, Mask2Former)
49.2
AP· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#7OneFormer (DiNAT-L, single-scale)
49.2
AP· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#8OneFormer (Swin-L, single-scale)
49
AP· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#9ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)SOTA
48.9
AP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#10Mask2Former (single-scale)SOTA
48.6
AP· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation Code
#11FocalNet-L (Mask2Former (200 queries))
48.4
AP· 2022-03-22
Focal Modulation Networks Code
#12PanopticFPN++SOTA
39.7
AP· 2020-05-26
End-to-End Object Detection with Transformers Code
#13DETR-R101 (ResNet-101)
33
AP· 2020-05-26
End-to-End Object Detection with Transformers Code