TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Per-Pixel Classification is Not All You Need for Semantic ...

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

2021-07-13NeurIPS 2021 12Panoptic SegmentationSegmentationSemantic SegmentationAllClassification
PaperPDFCodeCodeCode(official)

Abstract

Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.

Results

TaskDatasetMetricValueModel
Semantic SegmentationMapillary valmIoU55.4MaskFormer (ResNet-50)
Semantic SegmentationADE20K valmIoU55.6MaskFormer (Swin-L, ImageNet-22k pretrain)
Semantic SegmentationADE20KValidation mIoU53.8MaskFormer(Swin-B)
Semantic SegmentationADE20KValidation mIoU48.1MaskFormer(ResNet-101)
Semantic SegmentationCOCO test-devPQ53.3MaskFormer (Swin-L)
Semantic SegmentationCOCO test-devPQst44.5MaskFormer (Swin-L)
Semantic SegmentationCOCO test-devPQth59.1MaskFormer (Swin-L)
Semantic SegmentationADE20K valPQ35.7MaskFormer (R101 + 6 Enc)
Semantic SegmentationCOCO minivalPQ52.7MaskFormer (single-scale)
Semantic SegmentationCOCO minivalPQst44MaskFormer (single-scale)
Semantic SegmentationCOCO minivalPQth58.5MaskFormer (single-scale)
Semantic SegmentationCOCO minivalRQ63.5MaskFormer (single-scale)
Semantic SegmentationCOCO minivalSQ81.8MaskFormer (single-scale)
10-shot image generationMapillary valmIoU55.4MaskFormer (ResNet-50)
10-shot image generationADE20K valmIoU55.6MaskFormer (Swin-L, ImageNet-22k pretrain)
10-shot image generationADE20KValidation mIoU53.8MaskFormer(Swin-B)
10-shot image generationADE20KValidation mIoU48.1MaskFormer(ResNet-101)
10-shot image generationCOCO test-devPQ53.3MaskFormer (Swin-L)
10-shot image generationCOCO test-devPQst44.5MaskFormer (Swin-L)
10-shot image generationCOCO test-devPQth59.1MaskFormer (Swin-L)
10-shot image generationADE20K valPQ35.7MaskFormer (R101 + 6 Enc)
10-shot image generationCOCO minivalPQ52.7MaskFormer (single-scale)
10-shot image generationCOCO minivalPQst44MaskFormer (single-scale)
10-shot image generationCOCO minivalPQth58.5MaskFormer (single-scale)
10-shot image generationCOCO minivalRQ63.5MaskFormer (single-scale)
10-shot image generationCOCO minivalSQ81.8MaskFormer (single-scale)
Panoptic SegmentationCOCO test-devPQ53.3MaskFormer (Swin-L)
Panoptic SegmentationCOCO test-devPQst44.5MaskFormer (Swin-L)
Panoptic SegmentationCOCO test-devPQth59.1MaskFormer (Swin-L)
Panoptic SegmentationADE20K valPQ35.7MaskFormer (R101 + 6 Enc)
Panoptic SegmentationCOCO minivalPQ52.7MaskFormer (single-scale)
Panoptic SegmentationCOCO minivalPQst44MaskFormer (single-scale)
Panoptic SegmentationCOCO minivalPQth58.5MaskFormer (single-scale)
Panoptic SegmentationCOCO minivalRQ63.5MaskFormer (single-scale)
Panoptic SegmentationCOCO minivalSQ81.8MaskFormer (single-scale)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17