Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
10-shot image generation
/
COCO minival
10-shot image generation on COCO minival
Metric: AP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
AP (best first)
AP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AP
▼
Extra Data
Paper
Date
↕
Code
1
OpenSeeD (SwinL, single-scale)
53.2
Yes
A Simple Framework for Open-Vocabulary Segmentat...
2023-03-14
Code
2
OneFormer (InternImage-H,single-scale)
52
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
3
MasK DINO (SwinL,single-scale)
50.9
Yes
Mask DINO: Towards A Unified Transformer-based F...
2022-06-06
Code
4
UMG-CLIP-E/14
50.7
Yes
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
5
UMG-CLIP-L/14
49.7
Yes
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
6
DiNAT-L (single-scale, Mask2Former)
49.2
No
Dilated Neighborhood Attention Transformer
2022-09-29
Code
7
OneFormer (DiNAT-L, single-scale)
49.2
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
8
OneFormer (Swin-L, single-scale)
49
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
9
ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)
48.9
No
Vision Transformer Adapter for Dense Predictions
2022-05-17
Code
10
Mask2Former (single-scale)
48.6
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
11
FocalNet-L (Mask2Former (200 queries))
48.4
No
Focal Modulation Networks
2022-03-22
Code
12
PanopticFPN++
39.7
No
End-to-End Object Detection with Transformers
2020-05-26
Code
13
DETR-R101 (ResNet-101)
33
No
End-to-End Object Detection with Transformers
2020-05-26
Code
#1
OpenSeeD (SwinL, single-scale)
SOTA
53.2
AP
· Extra Data
· 2023-03-14
A Simple Framework for Open-Vocabulary Segmentation and Detection
Code
#2
OneFormer (InternImage-H,single-scale)
SOTA
52
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#3
MasK DINO (SwinL,single-scale)
SOTA
50.9
AP
· Extra Data
· 2022-06-06
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
Code
#4
UMG-CLIP-E/14
50.7
AP
· Extra Data
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#5
UMG-CLIP-L/14
49.7
AP
· Extra Data
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#6
DiNAT-L (single-scale, Mask2Former)
49.2
AP
· 2022-09-29
Dilated Neighborhood Attention Transformer
Code
#7
OneFormer (DiNAT-L, single-scale)
49.2
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#8
OneFormer (Swin-L, single-scale)
49
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#9
ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)
SOTA
48.9
AP
· 2022-05-17
Vision Transformer Adapter for Dense Predictions
Code
#10
Mask2Former (single-scale)
SOTA
48.6
AP
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#11
FocalNet-L (Mask2Former (200 queries))
48.4
AP
· 2022-03-22
Focal Modulation Networks
Code
#12
PanopticFPN++
SOTA
39.7
AP
· 2020-05-26
End-to-End Object Detection with Transformers
Code
#13
DETR-R101 (ResNet-101)
33
AP
· 2020-05-26
End-to-End Object Detection with Transformers
Code