TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OneFormer: One Transformer to Rule Universal Image Segment...

OneFormer: One Transformer to Rule Universal Image Segmentation

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

2022-11-10CVPR 2023 1Scene ParsingPanoptic SegmentationSegmentationSemantic SegmentationInstance Segmentation
PaperPDFCode(official)CodeCodeCode

Abstract

Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO (Common Objects in Context)mIoU68.8OneFormer (InternImage-H, emb_dim=1024, single-scale)
Semantic SegmentationCOCO (Common Objects in Context)mIoU68.1OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO (Common Objects in Context)mIoU67.4OneFormer (Swin-L, single-scale)
Semantic SegmentationMapillary valmIoU64.9OneFormer (DiNAT-L, multi-scale)
Semantic SegmentationCityscapes valmIoU85.8OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
Semantic SegmentationCityscapes valmIoU84.6OneFormer (ConvNeXt-XL, multi-scale)
Semantic SegmentationCityscapes valmIoU84.4OneFormer (Swin-L, multi-scale)
Semantic SegmentationADE20K valmIoU60.8OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
Semantic SegmentationADE20K valmIoU58.6OneFormer (DiNAT-L, multi-scale, 896x896)
Semantic SegmentationADE20K valmIoU58.4OneFormer (DiNAT-L, multi-scale, 640x640)
Semantic SegmentationADE20K valmIoU58.3OneFormer (Swin-L, multi-scale, 896x896)
Semantic SegmentationADE20K valmIoU57.7OneFormer (Swin-L, multi-scale, 640x640)
Semantic SegmentationCityscapes testPQ68OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Semantic SegmentationCityscapes valAP48.7OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic SegmentationCityscapes valPQ70.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic SegmentationCityscapes valPQst74.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic SegmentationCityscapes valPQth64.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic SegmentationCityscapes valmIoU84.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Semantic SegmentationCityscapes valAP46.5OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationCityscapes valPQ68.51OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationCityscapes valmIoU83OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationCityscapes valAP46.7OneFormer (ConvNeXt-XL, single-scale)
Semantic SegmentationCityscapes valPQ68.4OneFormer (ConvNeXt-XL, single-scale)
Semantic SegmentationCityscapes valmIoU83.6OneFormer (ConvNeXt-XL, single-scale)
Semantic SegmentationCityscapes valAP45.6OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCityscapes valPQ67.6OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCityscapes valmIoU83.1OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCityscapes valAP45.6OneFormer (Swin-L, single-scale)
Semantic SegmentationCityscapes valPQ67.2OneFormer (Swin-L, single-scale)
Semantic SegmentationCityscapes valmIoU83OneFormer (Swin-L, single-scale)
Semantic SegmentationMapillary valPQ46.7OneFormer (DiNAT-L, single-scale)
Semantic SegmentationMapillary valPQst54.9OneFormer (DiNAT-L, single-scale)
Semantic SegmentationMapillary valPQth40.5OneFormer (DiNAT-L, single-scale)
Semantic SegmentationMapillary valmIoU61.7OneFormer (DiNAT-L, single-scale)
Semantic SegmentationMapillary valPQ46.4OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationMapillary valPQst54OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationMapillary valPQth40.6OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationMapillary valmIoU61.6OneFormer (ConvNeXt-L, single-scale)
Semantic SegmentationADE20K valAP40.2OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic SegmentationADE20K valPQ54.5OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic SegmentationADE20K valmIoU60.4OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Semantic SegmentationADE20K valPQ53.4OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic SegmentationADE20K valmIoU58.9OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Semantic SegmentationADE20K valAP37.1OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valPQ51.5OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valAP37.8OneFormer (Swin-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valPQ51.4OneFormer (Swin-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valmIoU57OneFormer (Swin-L, single-scale, 1280x1280)
Semantic SegmentationADE20K valAP36OneFormer (DiNAT-L, single-scale, 640x640)
Semantic SegmentationADE20K valPQ50.5OneFormer (DiNAT-L, single-scale, 640x640)
Semantic SegmentationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 640x640)
Semantic SegmentationADE20K valAP36.3OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic SegmentationADE20K valPQ50.1OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic SegmentationADE20K valmIoU57.4OneFormer (ConvNeXt-XL, single-scale, 640x640)
Semantic SegmentationADE20K valAP36.2OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic SegmentationADE20K valPQ50OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic SegmentationADE20K valmIoU56.6OneFormer (ConvNeXt-L, single-scale, 640x640)
Semantic SegmentationADE20K valAP35.9OneFormer (Swin-L, single-scale, 640x640)
Semantic SegmentationADE20K valPQ49.8OneFormer (Swin-L, single-scale, 640x640)
Semantic SegmentationADE20K valmIoU57OneFormer (Swin-L, single-scale, 640x640)
Semantic SegmentationCOCO minivalAP52OneFormer (InternImage-H,single-scale)
Semantic SegmentationCOCO minivalPQ60OneFormer (InternImage-H,single-scale)
Semantic SegmentationCOCO minivalPQst49.2OneFormer (InternImage-H,single-scale)
Semantic SegmentationCOCO minivalPQth67.1OneFormer (InternImage-H,single-scale)
Semantic SegmentationCOCO minivalmIoU68.8OneFormer (InternImage-H,single-scale)
Semantic SegmentationCOCO minivalAP49.2OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO minivalPQ58OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO minivalPQst48.4OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO minivalPQth64.3OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO minivalmIoU68.1OneFormer (DiNAT-L, single-scale)
Semantic SegmentationCOCO minivalAP49OneFormer (Swin-L, single-scale)
Semantic SegmentationCOCO minivalPQ57.9OneFormer (Swin-L, single-scale)
Semantic SegmentationCOCO minivalPQst48OneFormer (Swin-L, single-scale)
Semantic SegmentationCOCO minivalPQth64.4OneFormer (Swin-L, single-scale)
Semantic SegmentationCOCO minivalmIoU67.4OneFormer (Swin-L, single-scale)
Instance SegmentationCityscapes valmask AP48.7OneFormer (ConvNeXt-L, single-scale, Mapillary-Pretrained)
Instance SegmentationCityscapes valmask AP45.6OneFormer (DiNAT-L, single-scale)
Instance SegmentationCityscapes valmask AP45.6OneFormer (Swin-L, single-scale)
Instance SegmentationCOCO val (panoptic labels)AP52OneFormer (InternImage-H, emb_dim=1024, single-scale)
Instance SegmentationCOCO val (panoptic labels)AP49.2OneFormer (DiNAT-L, single-scale)
Instance SegmentationCOCO val (panoptic labels)AP49OneFormer (Swin-L, single-scale)
Instance SegmentationADE20K valAP44.2OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance SegmentationADE20K valAPL64.3OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance SegmentationADE20K valAPM49.9OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance SegmentationADE20K valAPS23.7OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
Instance SegmentationADE20K valAP40.2OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance SegmentationADE20K valAPL59.7OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance SegmentationADE20K valAPM44.4OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance SegmentationADE20K valAPS19.2OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
Instance SegmentationADE20K valAP36OneFormer (DiNAT-L, single-scale)
Instance SegmentationADE20K valAP35.9OneFormer (Swin-L, single-scale)
10-shot image generationCOCO (Common Objects in Context)mIoU68.8OneFormer (InternImage-H, emb_dim=1024, single-scale)
10-shot image generationCOCO (Common Objects in Context)mIoU68.1OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO (Common Objects in Context)mIoU67.4OneFormer (Swin-L, single-scale)
10-shot image generationMapillary valmIoU64.9OneFormer (DiNAT-L, multi-scale)
10-shot image generationCityscapes valmIoU85.8OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
10-shot image generationCityscapes valmIoU84.6OneFormer (ConvNeXt-XL, multi-scale)
10-shot image generationCityscapes valmIoU84.4OneFormer (Swin-L, multi-scale)
10-shot image generationADE20K valmIoU60.8OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)
10-shot image generationADE20K valmIoU58.6OneFormer (DiNAT-L, multi-scale, 896x896)
10-shot image generationADE20K valmIoU58.4OneFormer (DiNAT-L, multi-scale, 640x640)
10-shot image generationADE20K valmIoU58.3OneFormer (Swin-L, multi-scale, 896x896)
10-shot image generationADE20K valmIoU57.7OneFormer (Swin-L, multi-scale, 640x640)
10-shot image generationCityscapes testPQ68OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
10-shot image generationCityscapes valAP48.7OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generationCityscapes valPQ70.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generationCityscapes valPQst74.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generationCityscapes valPQth64.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generationCityscapes valmIoU84.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
10-shot image generationCityscapes valAP46.5OneFormer (ConvNeXt-L, single-scale)
10-shot image generationCityscapes valPQ68.51OneFormer (ConvNeXt-L, single-scale)
10-shot image generationCityscapes valmIoU83OneFormer (ConvNeXt-L, single-scale)
10-shot image generationCityscapes valAP46.7OneFormer (ConvNeXt-XL, single-scale)
10-shot image generationCityscapes valPQ68.4OneFormer (ConvNeXt-XL, single-scale)
10-shot image generationCityscapes valmIoU83.6OneFormer (ConvNeXt-XL, single-scale)
10-shot image generationCityscapes valAP45.6OneFormer (DiNAT-L, single-scale)
10-shot image generationCityscapes valPQ67.6OneFormer (DiNAT-L, single-scale)
10-shot image generationCityscapes valmIoU83.1OneFormer (DiNAT-L, single-scale)
10-shot image generationCityscapes valAP45.6OneFormer (Swin-L, single-scale)
10-shot image generationCityscapes valPQ67.2OneFormer (Swin-L, single-scale)
10-shot image generationCityscapes valmIoU83OneFormer (Swin-L, single-scale)
10-shot image generationMapillary valPQ46.7OneFormer (DiNAT-L, single-scale)
10-shot image generationMapillary valPQst54.9OneFormer (DiNAT-L, single-scale)
10-shot image generationMapillary valPQth40.5OneFormer (DiNAT-L, single-scale)
10-shot image generationMapillary valmIoU61.7OneFormer (DiNAT-L, single-scale)
10-shot image generationMapillary valPQ46.4OneFormer (ConvNeXt-L, single-scale)
10-shot image generationMapillary valPQst54OneFormer (ConvNeXt-L, single-scale)
10-shot image generationMapillary valPQth40.6OneFormer (ConvNeXt-L, single-scale)
10-shot image generationMapillary valmIoU61.6OneFormer (ConvNeXt-L, single-scale)
10-shot image generationADE20K valAP40.2OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generationADE20K valPQ54.5OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generationADE20K valmIoU60.4OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
10-shot image generationADE20K valPQ53.4OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generationADE20K valmIoU58.9OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
10-shot image generationADE20K valAP37.1OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generationADE20K valPQ51.5OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 1280x1280)
10-shot image generationADE20K valAP37.8OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generationADE20K valPQ51.4OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generationADE20K valmIoU57OneFormer (Swin-L, single-scale, 1280x1280)
10-shot image generationADE20K valAP36OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generationADE20K valPQ50.5OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 640x640)
10-shot image generationADE20K valAP36.3OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generationADE20K valPQ50.1OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generationADE20K valmIoU57.4OneFormer (ConvNeXt-XL, single-scale, 640x640)
10-shot image generationADE20K valAP36.2OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generationADE20K valPQ50OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generationADE20K valmIoU56.6OneFormer (ConvNeXt-L, single-scale, 640x640)
10-shot image generationADE20K valAP35.9OneFormer (Swin-L, single-scale, 640x640)
10-shot image generationADE20K valPQ49.8OneFormer (Swin-L, single-scale, 640x640)
10-shot image generationADE20K valmIoU57OneFormer (Swin-L, single-scale, 640x640)
10-shot image generationCOCO minivalAP52OneFormer (InternImage-H,single-scale)
10-shot image generationCOCO minivalPQ60OneFormer (InternImage-H,single-scale)
10-shot image generationCOCO minivalPQst49.2OneFormer (InternImage-H,single-scale)
10-shot image generationCOCO minivalPQth67.1OneFormer (InternImage-H,single-scale)
10-shot image generationCOCO minivalmIoU68.8OneFormer (InternImage-H,single-scale)
10-shot image generationCOCO minivalAP49.2OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO minivalPQ58OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO minivalPQst48.4OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO minivalPQth64.3OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO minivalmIoU68.1OneFormer (DiNAT-L, single-scale)
10-shot image generationCOCO minivalAP49OneFormer (Swin-L, single-scale)
10-shot image generationCOCO minivalPQ57.9OneFormer (Swin-L, single-scale)
10-shot image generationCOCO minivalPQst48OneFormer (Swin-L, single-scale)
10-shot image generationCOCO minivalPQth64.4OneFormer (Swin-L, single-scale)
10-shot image generationCOCO minivalmIoU67.4OneFormer (Swin-L, single-scale)
Panoptic SegmentationCityscapes testPQ68OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)
Panoptic SegmentationCityscapes valAP48.7OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic SegmentationCityscapes valPQ70.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic SegmentationCityscapes valPQst74.1OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic SegmentationCityscapes valPQth64.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic SegmentationCityscapes valmIoU84.6OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
Panoptic SegmentationCityscapes valAP46.5OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationCityscapes valPQ68.51OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationCityscapes valmIoU83OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationCityscapes valAP46.7OneFormer (ConvNeXt-XL, single-scale)
Panoptic SegmentationCityscapes valPQ68.4OneFormer (ConvNeXt-XL, single-scale)
Panoptic SegmentationCityscapes valmIoU83.6OneFormer (ConvNeXt-XL, single-scale)
Panoptic SegmentationCityscapes valAP45.6OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCityscapes valPQ67.6OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCityscapes valmIoU83.1OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCityscapes valAP45.6OneFormer (Swin-L, single-scale)
Panoptic SegmentationCityscapes valPQ67.2OneFormer (Swin-L, single-scale)
Panoptic SegmentationCityscapes valmIoU83OneFormer (Swin-L, single-scale)
Panoptic SegmentationMapillary valPQ46.7OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationMapillary valPQst54.9OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationMapillary valPQth40.5OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationMapillary valmIoU61.7OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationMapillary valPQ46.4OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationMapillary valPQst54OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationMapillary valPQth40.6OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationMapillary valmIoU61.6OneFormer (ConvNeXt-L, single-scale)
Panoptic SegmentationADE20K valAP40.2OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic SegmentationADE20K valPQ54.5OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic SegmentationADE20K valmIoU60.4OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
Panoptic SegmentationADE20K valPQ53.4OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic SegmentationADE20K valmIoU58.9OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
Panoptic SegmentationADE20K valAP37.1OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valPQ51.5OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valAP37.8OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valPQ51.4OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valmIoU57OneFormer (Swin-L, single-scale, 1280x1280)
Panoptic SegmentationADE20K valAP36OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic SegmentationADE20K valPQ50.5OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic SegmentationADE20K valmIoU58.3OneFormer (DiNAT-L, single-scale, 640x640)
Panoptic SegmentationADE20K valAP36.3OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic SegmentationADE20K valPQ50.1OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic SegmentationADE20K valmIoU57.4OneFormer (ConvNeXt-XL, single-scale, 640x640)
Panoptic SegmentationADE20K valAP36.2OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic SegmentationADE20K valPQ50OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic SegmentationADE20K valmIoU56.6OneFormer (ConvNeXt-L, single-scale, 640x640)
Panoptic SegmentationADE20K valAP35.9OneFormer (Swin-L, single-scale, 640x640)
Panoptic SegmentationADE20K valPQ49.8OneFormer (Swin-L, single-scale, 640x640)
Panoptic SegmentationADE20K valmIoU57OneFormer (Swin-L, single-scale, 640x640)
Panoptic SegmentationCOCO minivalAP52OneFormer (InternImage-H,single-scale)
Panoptic SegmentationCOCO minivalPQ60OneFormer (InternImage-H,single-scale)
Panoptic SegmentationCOCO minivalPQst49.2OneFormer (InternImage-H,single-scale)
Panoptic SegmentationCOCO minivalPQth67.1OneFormer (InternImage-H,single-scale)
Panoptic SegmentationCOCO minivalmIoU68.8OneFormer (InternImage-H,single-scale)
Panoptic SegmentationCOCO minivalAP49.2OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCOCO minivalPQ58OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCOCO minivalPQst48.4OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCOCO minivalPQth64.3OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCOCO minivalmIoU68.1OneFormer (DiNAT-L, single-scale)
Panoptic SegmentationCOCO minivalAP49OneFormer (Swin-L, single-scale)
Panoptic SegmentationCOCO minivalPQ57.9OneFormer (Swin-L, single-scale)
Panoptic SegmentationCOCO minivalPQst48OneFormer (Swin-L, single-scale)
Panoptic SegmentationCOCO minivalPQth64.4OneFormer (Swin-L, single-scale)
Panoptic SegmentationCOCO minivalmIoU67.4OneFormer (Swin-L, single-scale)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17