OneFormer: One Transformer to Rule Universal Image Segmentation

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

2022-11-10CVPR 2023 1Scene Parsing Panoptic Segmentation Segmentation Semantic Segmentation Instance Segmentation

Abstract

Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	68.8	OneFormer (InternImage-H, emb_dim=1024, single-scale)
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	67.4	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	64.9	OneFormer (DiNAT-L, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	85.8	OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-XL, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	84.4	OneFormer (Swin-L, multi-scale)
Semantic Segmentation	ADE20K val	mIoU	60.8	OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	58.6	OneFormer (DiNAT-L, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	58.4	OneFormer (DiNAT-L, multi-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (Swin-L, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	57.7	OneFormer (Swin-L, multi-scale, 640x640)
Semantic Segmentation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Semantic Segmentation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic Segmentation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic Segmentation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)
Instance Segmentation	Cityscapes val	mask AP	48.7	OneFormer (ConvNeXt-L, single-scale, Mapillary-Pretrained)
Instance Segmentation	Cityscapes val	mask AP	45.6	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	Cityscapes val	mask AP	45.6	OneFormer (Swin-L, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	52	OneFormer (InternImage-H, emb_dim=1024, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	49.2	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	49	OneFormer (Swin-L, single-scale)
Instance Segmentation	ADE20K val	AP	44.2	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APL	64.3	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APM	49.9	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APS	23.7	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	AP	40.2	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APL	59.7	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APM	44.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APS	19.2	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	68.8	OneFormer (InternImage-H, emb_dim=1024, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	67.4	OneFormer (Swin-L, single-scale)
10-shot image generation	Mapillary val	mIoU	64.9	OneFormer (DiNAT-L, multi-scale)
10-shot image generation	Cityscapes val	mIoU	85.8	OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
10-shot image generation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-XL, multi-scale)
10-shot image generation	Cityscapes val	mIoU	84.4	OneFormer (Swin-L, multi-scale)
10-shot image generation	ADE20K val	mIoU	60.8	OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	58.6	OneFormer (DiNAT-L, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	58.4	OneFormer (DiNAT-L, multi-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (Swin-L, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	57.7	OneFormer (Swin-L, multi-scale, 640x640)
10-shot image generation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
10-shot image generation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
10-shot image generation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
10-shot image generation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Panoptic Segmentation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic Segmentation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic Segmentation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)

Abstract

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	68.8	OneFormer (InternImage-H, emb_dim=1024, single-scale)
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO (Common Objects in Context)	mIoU	67.4	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	64.9	OneFormer (DiNAT-L, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	85.8	OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-XL, multi-scale)
Semantic Segmentation	Cityscapes val	mIoU	84.4	OneFormer (Swin-L, multi-scale)
Semantic Segmentation	ADE20K val	mIoU	60.8	OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	58.6	OneFormer (DiNAT-L, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	58.4	OneFormer (DiNAT-L, multi-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (Swin-L, multi-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	57.7	OneFormer (Swin-L, multi-scale, 640x640)
Semantic Segmentation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Semantic Segmentation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic Segmentation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
Semantic Segmentation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
Semantic Segmentation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
Semantic Segmentation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic Segmentation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic Segmentation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic Segmentation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
Semantic Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
Semantic Segmentation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
Semantic Segmentation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Semantic Segmentation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
Semantic Segmentation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)
Instance Segmentation	Cityscapes val	mask AP	48.7	OneFormer (ConvNeXt-L, single-scale, Mapillary-Pretrained)
Instance Segmentation	Cityscapes val	mask AP	45.6	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	Cityscapes val	mask AP	45.6	OneFormer (Swin-L, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	52	OneFormer (InternImage-H, emb_dim=1024, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	49.2	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	COCO val (panoptic labels)	AP	49	OneFormer (Swin-L, single-scale)
Instance Segmentation	ADE20K val	AP	44.2	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APL	64.3	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APM	49.9	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	APS	23.7	OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance Segmentation	ADE20K val	AP	40.2	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APL	59.7	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APM	44.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	APS	19.2	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale)
Instance Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	68.8	OneFormer (InternImage-H, emb_dim=1024, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO (Common Objects in Context)	mIoU	67.4	OneFormer (Swin-L, single-scale)
10-shot image generation	Mapillary val	mIoU	64.9	OneFormer (DiNAT-L, multi-scale)
10-shot image generation	Cityscapes val	mIoU	85.8	OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
10-shot image generation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-XL, multi-scale)
10-shot image generation	Cityscapes val	mIoU	84.4	OneFormer (Swin-L, multi-scale)
10-shot image generation	ADE20K val	mIoU	60.8	OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	58.6	OneFormer (DiNAT-L, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	58.4	OneFormer (DiNAT-L, multi-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (Swin-L, multi-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	57.7	OneFormer (Swin-L, multi-scale, 640x640)
10-shot image generation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
10-shot image generation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
10-shot image generation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
10-shot image generation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
10-shot image generation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
10-shot image generation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
10-shot image generation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
10-shot image generation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
10-shot image generation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
10-shot image generation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
10-shot image generation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
10-shot image generation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes test	PQ	68	OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Panoptic Segmentation	Cityscapes val	AP	48.7	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQ	70.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQst	74.1	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	PQth	64.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	mIoU	84.6	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic Segmentation	Cityscapes val	AP	46.5	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	68.51	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Cityscapes val	AP	46.7	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	68.4	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83.6	OneFormer (ConvNeXt-XL, single-scale)
Panoptic Segmentation	Cityscapes val	AP	45.6	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	67.6	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83.1	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Cityscapes val	AP	45.6	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes val	PQ	67.2	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Cityscapes val	mIoU	83	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	Mapillary val	PQ	46.7	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQst	54.9	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQth	40.5	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	mIoU	61.7	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	Mapillary val	PQ	46.4	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	PQst	54	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	PQth	40.6	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	Mapillary val	mIoU	61.6	OneFormer (ConvNeXt-L, single-scale)
Panoptic Segmentation	ADE20K val	AP	40.2	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	PQ	54.5	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	mIoU	60.4	OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic Segmentation	ADE20K val	PQ	53.4	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic Segmentation	ADE20K val	mIoU	58.9	OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic Segmentation	ADE20K val	AP	37.1	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	PQ	51.5	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	AP	37.8	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	PQ	51.4	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic Segmentation	ADE20K val	AP	36	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50.5	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	58.3	OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	36.3	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50.1	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	57.4	OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	36.2	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	50	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	56.6	OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	AP	35.9	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	PQ	49.8	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	ADE20K val	mIoU	57	OneFormer (Swin-L, single-scale, 640x640)
Panoptic Segmentation	COCO minival	AP	52	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQ	60	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQst	49.2	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	PQth	67.1	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	mIoU	68.8	OneFormer (InternImage-H,single-scale)
Panoptic Segmentation	COCO minival	AP	49.2	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQ	58	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQst	48.4	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	PQth	64.3	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	mIoU	68.1	OneFormer (DiNAT-L, single-scale)
Panoptic Segmentation	COCO minival	AP	49	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQ	57.9	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQst	48	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	PQth	64.4	OneFormer (Swin-L, single-scale)
Panoptic Segmentation	COCO minival	mIoU	67.4	OneFormer (Swin-L, single-scale)

OneFormer: One Transformer to Rule Universal Image Segmentation

Abstract

Results

Related Papers

OneFormer: One Transformer to Rule Universal Image Segmentation

Abstract

Results

Related Papers