SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

2021-05-31NeurIPS 2021 122D Semantic Segmentation Thermal Image Segmentation Crack Segmentation Semantic Segmentation

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	US3D	mIoU	75.14	SegFormer-B2
Semantic Segmentation	US3D	mIoU	74.19	SegFormer-B1
Semantic Segmentation	US3D	mIoU	71.8	SegFormer-B0
Semantic Segmentation	DELIVER	mIoU	57.2	SegFormer
Semantic Segmentation	UPLight	mIoU	89.6	SegFormer-B2 (RGB)
Semantic Segmentation	Fine-Grained Grass Segmentation Dataset	mIoU	48.29	SegFormer
Semantic Segmentation	DSEC	mIoU	71.99	SegFormer-B2
Semantic Segmentation	Synthetic Bathing Perception	mIoU	86.86	SegFormer
Semantic Segmentation	Cityscapes val	mIoU	84	SegFormer (MiT-B5, Mapillary)
Semantic Segmentation	Cityscapes val	Validation mIoU	76.2	SegFormer-B0
Semantic Segmentation	SELMA	mIoU	77.2	SegFormer
Semantic Segmentation	ZJU-RGB-P	mIoU	89.6	SegFormer-B2 (RGB)
Semantic Segmentation	DDD17	mIoU	71.05	SegFormer-B2
Semantic Segmentation	ADE20K val	mIoU	51.8	SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
Semantic Segmentation	SpectralWaste	mIoU	54.3	SegFormer (HYPER)
Semantic Segmentation	SpectralWaste	mIoU	53.5	SegFormer (HYPER3)
Semantic Segmentation	SpectralWaste	mIoU	48.4	SegFormer (RGB)
Semantic Segmentation	Potsdam	mIoU	84.65	SegFormer-B2
Semantic Segmentation	Potsdam	mIoU	84.37	SegFormer-B1
Semantic Segmentation	Potsdam	mIoU	83.67	SegFormer-B0
Semantic Segmentation	UrbanLF	mIoU (Real)	82.2	SegFormer
Semantic Segmentation	UrbanLF	mIoU (Syn)	78.53	SegFormer
Semantic Segmentation	COCO-Stuff full	Mean IoU (class)	46.7	SegFormer-B5 (Single Scale)
Semantic Segmentation	EventScape	mIoU	59.86	SegFormer-B4
Semantic Segmentation	EventScape	mIoU	58.69	SegFormer-B2
Semantic Segmentation	Vaihingen	mIoU	76.92	SegFormer-B1
Semantic Segmentation	Vaihingen	mIoU	76.69	SegFormer-B2
Semantic Segmentation	Vaihingen	mIoU	75.57	SegFormer-B0
Semantic Segmentation	DADA-seg	mIoU	27	SegFormer (MiT-B3)
Semantic Segmentation	DADA-seg	mIoU	21.2	SegFormer (MiT-B2)
Semantic Segmentation	DADA-seg	mIoU	16.6	SegFormer (MiT-B1)
Semantic Segmentation	ADE20K	Params (M)	84.7	SegFormer-B5
Semantic Segmentation	ADE20K	Validation mIoU	51.8	SegFormer-B5
Semantic Segmentation	ADE20K	Params (M)	64.1	SegFormer-B4
Semantic Segmentation	ADE20K	Validation mIoU	51.1	SegFormer-B4
Semantic Segmentation	ADE20K	Params (M)	3.8	SegFormer-B0
Semantic Segmentation	ADE20K	Validation mIoU	37.4	SegFormer-B0
Semantic Segmentation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
Semantic Segmentation	MFN Dataset	mIOU	54.8	SegFormer (B4)
Semantic Segmentation	MFN Dataset	mIOU	53.2	SegFormer (B2)
Semantic Segmentation	CrackVision12K	mIoU	0.57969	SegFormer
2D Semantic Segmentation	WildScenes	mIoU	40.83	Segformer (MiT-B5)
Scene Segmentation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
Scene Segmentation	MFN Dataset	mIOU	54.8	SegFormer (B4)
Scene Segmentation	MFN Dataset	mIOU	53.2	SegFormer (B2)
2D Object Detection	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
2D Object Detection	MFN Dataset	mIOU	54.8	SegFormer (B4)
2D Object Detection	MFN Dataset	mIOU	53.2	SegFormer (B2)
10-shot image generation	US3D	mIoU	75.14	SegFormer-B2
10-shot image generation	US3D	mIoU	74.19	SegFormer-B1
10-shot image generation	US3D	mIoU	71.8	SegFormer-B0
10-shot image generation	DELIVER	mIoU	57.2	SegFormer
10-shot image generation	UPLight	mIoU	89.6	SegFormer-B2 (RGB)
10-shot image generation	Fine-Grained Grass Segmentation Dataset	mIoU	48.29	SegFormer
10-shot image generation	DSEC	mIoU	71.99	SegFormer-B2
10-shot image generation	Synthetic Bathing Perception	mIoU	86.86	SegFormer
10-shot image generation	Cityscapes val	mIoU	84	SegFormer (MiT-B5, Mapillary)
10-shot image generation	Cityscapes val	Validation mIoU	76.2	SegFormer-B0
10-shot image generation	SELMA	mIoU	77.2	SegFormer
10-shot image generation	ZJU-RGB-P	mIoU	89.6	SegFormer-B2 (RGB)
10-shot image generation	DDD17	mIoU	71.05	SegFormer-B2
10-shot image generation	ADE20K val	mIoU	51.8	SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
10-shot image generation	SpectralWaste	mIoU	54.3	SegFormer (HYPER)
10-shot image generation	SpectralWaste	mIoU	53.5	SegFormer (HYPER3)
10-shot image generation	SpectralWaste	mIoU	48.4	SegFormer (RGB)
10-shot image generation	Potsdam	mIoU	84.65	SegFormer-B2
10-shot image generation	Potsdam	mIoU	84.37	SegFormer-B1
10-shot image generation	Potsdam	mIoU	83.67	SegFormer-B0
10-shot image generation	UrbanLF	mIoU (Real)	82.2	SegFormer
10-shot image generation	UrbanLF	mIoU (Syn)	78.53	SegFormer
10-shot image generation	COCO-Stuff full	Mean IoU (class)	46.7	SegFormer-B5 (Single Scale)
10-shot image generation	EventScape	mIoU	59.86	SegFormer-B4
10-shot image generation	EventScape	mIoU	58.69	SegFormer-B2
10-shot image generation	Vaihingen	mIoU	76.92	SegFormer-B1
10-shot image generation	Vaihingen	mIoU	76.69	SegFormer-B2
10-shot image generation	Vaihingen	mIoU	75.57	SegFormer-B0
10-shot image generation	DADA-seg	mIoU	27	SegFormer (MiT-B3)
10-shot image generation	DADA-seg	mIoU	21.2	SegFormer (MiT-B2)
10-shot image generation	DADA-seg	mIoU	16.6	SegFormer (MiT-B1)
10-shot image generation	ADE20K	Params (M)	84.7	SegFormer-B5
10-shot image generation	ADE20K	Validation mIoU	51.8	SegFormer-B5
10-shot image generation	ADE20K	Params (M)	64.1	SegFormer-B4
10-shot image generation	ADE20K	Validation mIoU	51.1	SegFormer-B4
10-shot image generation	ADE20K	Params (M)	3.8	SegFormer-B0
10-shot image generation	ADE20K	Validation mIoU	37.4	SegFormer-B0
10-shot image generation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
10-shot image generation	MFN Dataset	mIOU	54.8	SegFormer (B4)
10-shot image generation	MFN Dataset	mIOU	53.2	SegFormer (B2)
10-shot image generation	CrackVision12K	mIoU	0.57969	SegFormer

Abstract

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	US3D	mIoU	75.14	SegFormer-B2
Semantic Segmentation	US3D	mIoU	74.19	SegFormer-B1
Semantic Segmentation	US3D	mIoU	71.8	SegFormer-B0
Semantic Segmentation	DELIVER	mIoU	57.2	SegFormer
Semantic Segmentation	UPLight	mIoU	89.6	SegFormer-B2 (RGB)
Semantic Segmentation	Fine-Grained Grass Segmentation Dataset	mIoU	48.29	SegFormer
Semantic Segmentation	DSEC	mIoU	71.99	SegFormer-B2
Semantic Segmentation	Synthetic Bathing Perception	mIoU	86.86	SegFormer
Semantic Segmentation	Cityscapes val	mIoU	84	SegFormer (MiT-B5, Mapillary)
Semantic Segmentation	Cityscapes val	Validation mIoU	76.2	SegFormer-B0
Semantic Segmentation	SELMA	mIoU	77.2	SegFormer
Semantic Segmentation	ZJU-RGB-P	mIoU	89.6	SegFormer-B2 (RGB)
Semantic Segmentation	DDD17	mIoU	71.05	SegFormer-B2
Semantic Segmentation	ADE20K val	mIoU	51.8	SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
Semantic Segmentation	SpectralWaste	mIoU	54.3	SegFormer (HYPER)
Semantic Segmentation	SpectralWaste	mIoU	53.5	SegFormer (HYPER3)
Semantic Segmentation	SpectralWaste	mIoU	48.4	SegFormer (RGB)
Semantic Segmentation	Potsdam	mIoU	84.65	SegFormer-B2
Semantic Segmentation	Potsdam	mIoU	84.37	SegFormer-B1
Semantic Segmentation	Potsdam	mIoU	83.67	SegFormer-B0
Semantic Segmentation	UrbanLF	mIoU (Real)	82.2	SegFormer
Semantic Segmentation	UrbanLF	mIoU (Syn)	78.53	SegFormer
Semantic Segmentation	COCO-Stuff full	Mean IoU (class)	46.7	SegFormer-B5 (Single Scale)
Semantic Segmentation	EventScape	mIoU	59.86	SegFormer-B4
Semantic Segmentation	EventScape	mIoU	58.69	SegFormer-B2
Semantic Segmentation	Vaihingen	mIoU	76.92	SegFormer-B1
Semantic Segmentation	Vaihingen	mIoU	76.69	SegFormer-B2
Semantic Segmentation	Vaihingen	mIoU	75.57	SegFormer-B0
Semantic Segmentation	DADA-seg	mIoU	27	SegFormer (MiT-B3)
Semantic Segmentation	DADA-seg	mIoU	21.2	SegFormer (MiT-B2)
Semantic Segmentation	DADA-seg	mIoU	16.6	SegFormer (MiT-B1)
Semantic Segmentation	ADE20K	Params (M)	84.7	SegFormer-B5
Semantic Segmentation	ADE20K	Validation mIoU	51.8	SegFormer-B5
Semantic Segmentation	ADE20K	Params (M)	64.1	SegFormer-B4
Semantic Segmentation	ADE20K	Validation mIoU	51.1	SegFormer-B4
Semantic Segmentation	ADE20K	Params (M)	3.8	SegFormer-B0
Semantic Segmentation	ADE20K	Validation mIoU	37.4	SegFormer-B0
Semantic Segmentation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
Semantic Segmentation	MFN Dataset	mIOU	54.8	SegFormer (B4)
Semantic Segmentation	MFN Dataset	mIOU	53.2	SegFormer (B2)
Semantic Segmentation	CrackVision12K	mIoU	0.57969	SegFormer
2D Semantic Segmentation	WildScenes	mIoU	40.83	Segformer (MiT-B5)
Scene Segmentation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
Scene Segmentation	MFN Dataset	mIOU	54.8	SegFormer (B4)
Scene Segmentation	MFN Dataset	mIOU	53.2	SegFormer (B2)
2D Object Detection	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
2D Object Detection	MFN Dataset	mIOU	54.8	SegFormer (B4)
2D Object Detection	MFN Dataset	mIOU	53.2	SegFormer (B2)
10-shot image generation	US3D	mIoU	75.14	SegFormer-B2
10-shot image generation	US3D	mIoU	74.19	SegFormer-B1
10-shot image generation	US3D	mIoU	71.8	SegFormer-B0
10-shot image generation	DELIVER	mIoU	57.2	SegFormer
10-shot image generation	UPLight	mIoU	89.6	SegFormer-B2 (RGB)
10-shot image generation	Fine-Grained Grass Segmentation Dataset	mIoU	48.29	SegFormer
10-shot image generation	DSEC	mIoU	71.99	SegFormer-B2
10-shot image generation	Synthetic Bathing Perception	mIoU	86.86	SegFormer
10-shot image generation	Cityscapes val	mIoU	84	SegFormer (MiT-B5, Mapillary)
10-shot image generation	Cityscapes val	Validation mIoU	76.2	SegFormer-B0
10-shot image generation	SELMA	mIoU	77.2	SegFormer
10-shot image generation	ZJU-RGB-P	mIoU	89.6	SegFormer-B2 (RGB)
10-shot image generation	DDD17	mIoU	71.05	SegFormer-B2
10-shot image generation	ADE20K val	mIoU	51.8	SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
10-shot image generation	SpectralWaste	mIoU	54.3	SegFormer (HYPER)
10-shot image generation	SpectralWaste	mIoU	53.5	SegFormer (HYPER3)
10-shot image generation	SpectralWaste	mIoU	48.4	SegFormer (RGB)
10-shot image generation	Potsdam	mIoU	84.65	SegFormer-B2
10-shot image generation	Potsdam	mIoU	84.37	SegFormer-B1
10-shot image generation	Potsdam	mIoU	83.67	SegFormer-B0
10-shot image generation	UrbanLF	mIoU (Real)	82.2	SegFormer
10-shot image generation	UrbanLF	mIoU (Syn)	78.53	SegFormer
10-shot image generation	COCO-Stuff full	Mean IoU (class)	46.7	SegFormer-B5 (Single Scale)
10-shot image generation	EventScape	mIoU	59.86	SegFormer-B4
10-shot image generation	EventScape	mIoU	58.69	SegFormer-B2
10-shot image generation	Vaihingen	mIoU	76.92	SegFormer-B1
10-shot image generation	Vaihingen	mIoU	76.69	SegFormer-B2
10-shot image generation	Vaihingen	mIoU	75.57	SegFormer-B0
10-shot image generation	DADA-seg	mIoU	27	SegFormer (MiT-B3)
10-shot image generation	DADA-seg	mIoU	21.2	SegFormer (MiT-B2)
10-shot image generation	DADA-seg	mIoU	16.6	SegFormer (MiT-B1)
10-shot image generation	ADE20K	Params (M)	84.7	SegFormer-B5
10-shot image generation	ADE20K	Validation mIoU	51.8	SegFormer-B5
10-shot image generation	ADE20K	Params (M)	64.1	SegFormer-B4
10-shot image generation	ADE20K	Validation mIoU	51.1	SegFormer-B4
10-shot image generation	ADE20K	Params (M)	3.8	SegFormer-B0
10-shot image generation	ADE20K	Validation mIoU	37.4	SegFormer-B0
10-shot image generation	RGB-T-Glass-Segmentation	MAE	0.053	SegFormer
10-shot image generation	MFN Dataset	mIOU	54.8	SegFormer (B4)
10-shot image generation	MFN Dataset	mIOU	53.2	SegFormer (B2)
10-shot image generation	CrackVision12K	mIoU	0.57969	SegFormer

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Abstract

Results

Related Papers

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Abstract

Results

Related Papers