TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SegFormer: Simple and Efficient Design for Semantic Segmen...

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

2021-05-31NeurIPS 2021 122D Semantic SegmentationThermal Image SegmentationCrack SegmentationSemantic Segmentation
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.

Results

TaskDatasetMetricValueModel
Semantic Segmentation US3DmIoU75.14SegFormer-B2
Semantic Segmentation US3DmIoU74.19SegFormer-B1
Semantic Segmentation US3DmIoU71.8SegFormer-B0
Semantic SegmentationDELIVERmIoU57.2SegFormer
Semantic SegmentationUPLightmIoU89.6SegFormer-B2 (RGB)
Semantic SegmentationFine-Grained Grass Segmentation DatasetmIoU48.29SegFormer
Semantic SegmentationDSECmIoU71.99SegFormer-B2
Semantic Segmentation Synthetic Bathing Perception mIoU86.86SegFormer
Semantic SegmentationCityscapes valmIoU84SegFormer (MiT-B5, Mapillary)
Semantic SegmentationCityscapes valValidation mIoU76.2SegFormer-B0
Semantic SegmentationSELMAmIoU77.2SegFormer
Semantic SegmentationZJU-RGB-PmIoU89.6SegFormer-B2 (RGB)
Semantic SegmentationDDD17mIoU71.05SegFormer-B2
Semantic SegmentationADE20K valmIoU51.8SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
Semantic SegmentationSpectralWastemIoU54.3SegFormer (HYPER)
Semantic SegmentationSpectralWastemIoU53.5SegFormer (HYPER3)
Semantic SegmentationSpectralWastemIoU48.4SegFormer (RGB)
Semantic Segmentation PotsdammIoU84.65SegFormer-B2
Semantic Segmentation PotsdammIoU84.37SegFormer-B1
Semantic Segmentation PotsdammIoU83.67SegFormer-B0
Semantic SegmentationUrbanLFmIoU (Real)82.2SegFormer
Semantic SegmentationUrbanLFmIoU (Syn)78.53SegFormer
Semantic SegmentationCOCO-Stuff fullMean IoU (class)46.7SegFormer-B5 (Single Scale)
Semantic SegmentationEventScapemIoU59.86SegFormer-B4
Semantic SegmentationEventScapemIoU58.69SegFormer-B2
Semantic SegmentationVaihingenmIoU76.92SegFormer-B1
Semantic SegmentationVaihingenmIoU76.69SegFormer-B2
Semantic SegmentationVaihingenmIoU75.57SegFormer-B0
Semantic SegmentationDADA-segmIoU27SegFormer (MiT-B3)
Semantic SegmentationDADA-segmIoU21.2SegFormer (MiT-B2)
Semantic SegmentationDADA-segmIoU16.6SegFormer (MiT-B1)
Semantic SegmentationADE20KParams (M)84.7SegFormer-B5
Semantic SegmentationADE20KValidation mIoU51.8SegFormer-B5
Semantic SegmentationADE20KParams (M)64.1SegFormer-B4
Semantic SegmentationADE20KValidation mIoU51.1SegFormer-B4
Semantic SegmentationADE20KParams (M)3.8SegFormer-B0
Semantic SegmentationADE20KValidation mIoU37.4SegFormer-B0
Semantic SegmentationRGB-T-Glass-SegmentationMAE0.053SegFormer
Semantic SegmentationMFN DatasetmIOU54.8SegFormer (B4)
Semantic SegmentationMFN DatasetmIOU53.2SegFormer (B2)
Semantic SegmentationCrackVision12KmIoU0.57969SegFormer
2D Semantic SegmentationWildScenesmIoU40.83Segformer (MiT-B5)
Scene SegmentationRGB-T-Glass-SegmentationMAE0.053SegFormer
Scene SegmentationMFN DatasetmIOU54.8SegFormer (B4)
Scene SegmentationMFN DatasetmIOU53.2SegFormer (B2)
2D Object DetectionRGB-T-Glass-SegmentationMAE0.053SegFormer
2D Object DetectionMFN DatasetmIOU54.8SegFormer (B4)
2D Object DetectionMFN DatasetmIOU53.2SegFormer (B2)
10-shot image generation US3DmIoU75.14SegFormer-B2
10-shot image generation US3DmIoU74.19SegFormer-B1
10-shot image generation US3DmIoU71.8SegFormer-B0
10-shot image generationDELIVERmIoU57.2SegFormer
10-shot image generationUPLightmIoU89.6SegFormer-B2 (RGB)
10-shot image generationFine-Grained Grass Segmentation DatasetmIoU48.29SegFormer
10-shot image generationDSECmIoU71.99SegFormer-B2
10-shot image generation Synthetic Bathing Perception mIoU86.86SegFormer
10-shot image generationCityscapes valmIoU84SegFormer (MiT-B5, Mapillary)
10-shot image generationCityscapes valValidation mIoU76.2SegFormer-B0
10-shot image generationSELMAmIoU77.2SegFormer
10-shot image generationZJU-RGB-PmIoU89.6SegFormer-B2 (RGB)
10-shot image generationDDD17mIoU71.05SegFormer-B2
10-shot image generationADE20K valmIoU51.8SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)
10-shot image generationSpectralWastemIoU54.3SegFormer (HYPER)
10-shot image generationSpectralWastemIoU53.5SegFormer (HYPER3)
10-shot image generationSpectralWastemIoU48.4SegFormer (RGB)
10-shot image generation PotsdammIoU84.65SegFormer-B2
10-shot image generation PotsdammIoU84.37SegFormer-B1
10-shot image generation PotsdammIoU83.67SegFormer-B0
10-shot image generationUrbanLFmIoU (Real)82.2SegFormer
10-shot image generationUrbanLFmIoU (Syn)78.53SegFormer
10-shot image generationCOCO-Stuff fullMean IoU (class)46.7SegFormer-B5 (Single Scale)
10-shot image generationEventScapemIoU59.86SegFormer-B4
10-shot image generationEventScapemIoU58.69SegFormer-B2
10-shot image generationVaihingenmIoU76.92SegFormer-B1
10-shot image generationVaihingenmIoU76.69SegFormer-B2
10-shot image generationVaihingenmIoU75.57SegFormer-B0
10-shot image generationDADA-segmIoU27SegFormer (MiT-B3)
10-shot image generationDADA-segmIoU21.2SegFormer (MiT-B2)
10-shot image generationDADA-segmIoU16.6SegFormer (MiT-B1)
10-shot image generationADE20KParams (M)84.7SegFormer-B5
10-shot image generationADE20KValidation mIoU51.8SegFormer-B5
10-shot image generationADE20KParams (M)64.1SegFormer-B4
10-shot image generationADE20KValidation mIoU51.1SegFormer-B4
10-shot image generationADE20KParams (M)3.8SegFormer-B0
10-shot image generationADE20KValidation mIoU37.4SegFormer-B0
10-shot image generationRGB-T-Glass-SegmentationMAE0.053SegFormer
10-shot image generationMFN DatasetmIOU54.8SegFormer (B4)
10-shot image generationMFN DatasetmIOU53.2SegFormer (B2)
10-shot image generationCrackVision12KmIoU0.57969SegFormer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15