Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

2021-07-13NeurIPS 2021 12Panoptic Segmentation Segmentation Semantic Segmentation All Classification

Abstract

Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Mapillary val	mIoU	55.4	MaskFormer (ResNet-50)
Semantic Segmentation	ADE20K val	mIoU	55.6	MaskFormer (Swin-L, ImageNet-22k pretrain)
Semantic Segmentation	ADE20K	Validation mIoU	53.8	MaskFormer(Swin-B)
Semantic Segmentation	ADE20K	Validation mIoU	48.1	MaskFormer(ResNet-101)
Semantic Segmentation	COCO test-dev	PQ	53.3	MaskFormer (Swin-L)
Semantic Segmentation	COCO test-dev	PQst	44.5	MaskFormer (Swin-L)
Semantic Segmentation	COCO test-dev	PQth	59.1	MaskFormer (Swin-L)
Semantic Segmentation	ADE20K val	PQ	35.7	MaskFormer (R101 + 6 Enc)
Semantic Segmentation	COCO minival	PQ	52.7	MaskFormer (single-scale)
Semantic Segmentation	COCO minival	PQst	44	MaskFormer (single-scale)
Semantic Segmentation	COCO minival	PQth	58.5	MaskFormer (single-scale)
Semantic Segmentation	COCO minival	RQ	63.5	MaskFormer (single-scale)
Semantic Segmentation	COCO minival	SQ	81.8	MaskFormer (single-scale)
10-shot image generation	Mapillary val	mIoU	55.4	MaskFormer (ResNet-50)
10-shot image generation	ADE20K val	mIoU	55.6	MaskFormer (Swin-L, ImageNet-22k pretrain)
10-shot image generation	ADE20K	Validation mIoU	53.8	MaskFormer(Swin-B)
10-shot image generation	ADE20K	Validation mIoU	48.1	MaskFormer(ResNet-101)
10-shot image generation	COCO test-dev	PQ	53.3	MaskFormer (Swin-L)
10-shot image generation	COCO test-dev	PQst	44.5	MaskFormer (Swin-L)
10-shot image generation	COCO test-dev	PQth	59.1	MaskFormer (Swin-L)
10-shot image generation	ADE20K val	PQ	35.7	MaskFormer (R101 + 6 Enc)
10-shot image generation	COCO minival	PQ	52.7	MaskFormer (single-scale)
10-shot image generation	COCO minival	PQst	44	MaskFormer (single-scale)
10-shot image generation	COCO minival	PQth	58.5	MaskFormer (single-scale)
10-shot image generation	COCO minival	RQ	63.5	MaskFormer (single-scale)
10-shot image generation	COCO minival	SQ	81.8	MaskFormer (single-scale)
Panoptic Segmentation	COCO test-dev	PQ	53.3	MaskFormer (Swin-L)
Panoptic Segmentation	COCO test-dev	PQst	44.5	MaskFormer (Swin-L)
Panoptic Segmentation	COCO test-dev	PQth	59.1	MaskFormer (Swin-L)
Panoptic Segmentation	ADE20K val	PQ	35.7	MaskFormer (R101 + 6 Enc)
Panoptic Segmentation	COCO minival	PQ	52.7	MaskFormer (single-scale)
Panoptic Segmentation	COCO minival	PQst	44	MaskFormer (single-scale)
Panoptic Segmentation	COCO minival	PQth	58.5	MaskFormer (single-scale)
Panoptic Segmentation	COCO minival	RQ	63.5	MaskFormer (single-scale)
Panoptic Segmentation	COCO minival	SQ	81.8	MaskFormer (single-scale)

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Abstract

Results

Related Papers

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Abstract

Results

Related Papers