TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking Semantic Segmentation from a Sequence-to-Sequen...

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, Li Zhang

2020-12-31CVPR 2021 1SegmentationSemantic SegmentationMedical Image Segmentation
PaperPDFCodeCode(official)CodeCodeCode

Abstract

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.

Results

TaskDatasetMetricValueModel
Medical Image SegmentationSynapse multi-organ CTAvg DSC79.6SETR
Semantic SegmentationCityscapes valmIoU82.15SETR-PUP (80k, MS)
Semantic SegmentationPASCAL ContextmIoU55.83SETR-MLA (16, 80k, MS)
Semantic SegmentationFoodSeg103mIoU45.1SeTR-MLA (ViT-16/B)
Semantic SegmentationFoodSeg103mIoU41.3SeTR-Naive (ViT-16/B)
Semantic SegmentationUrbanLFmIoU (Real)77.74SETR (ViT-Large)
Semantic SegmentationUrbanLFmIoU (Syn)77.69SETR (ViT-Large)
Semantic SegmentationDADA-segmIoU31.8SETR (PUP, Transformer-Large)
Semantic SegmentationDADA-segmIoU30.4SETR (MLA, Transformer-Large)
Semantic SegmentationADE20KValidation mIoU50.28SETR-MLA (160k, MS)
10-shot image generationCityscapes valmIoU82.15SETR-PUP (80k, MS)
10-shot image generationPASCAL ContextmIoU55.83SETR-MLA (16, 80k, MS)
10-shot image generationFoodSeg103mIoU45.1SeTR-MLA (ViT-16/B)
10-shot image generationFoodSeg103mIoU41.3SeTR-Naive (ViT-16/B)
10-shot image generationUrbanLFmIoU (Real)77.74SETR (ViT-Large)
10-shot image generationUrbanLFmIoU (Syn)77.69SETR (ViT-Large)
10-shot image generationDADA-segmIoU31.8SETR (PUP, Transformer-Large)
10-shot image generationDADA-segmIoU30.4SETR (MLA, Transformer-Large)
10-shot image generationADE20KValidation mIoU50.28SETR-MLA (160k, MS)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17