TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SeMask: Semantically Masked Transformers for Semantic Segm...

SeMask: Semantically Masked Transformers for Semantic Segmentation

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

2021-12-23arXiv 2021 12Semantic Segmentation
PaperPDFCode(official)

Abstract

Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the encoding stage. This paper argues that incorporating semantic information of the image into pretrained hierarchical transformer-based backbones while finetuning improves the performance considerably. To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation. In addition, we use a lightweight semantic decoder during training to provide supervision to the intermediate semantic prior maps at every stage. Our experiments demonstrate that incorporating semantic priors enhances the performance of the established hierarchical encoders with a slight increase in the number of FLOPs. We provide empirical proof by integrating SeMask into Swin Transformer and Mix Transformer backbones as our encoder paired with different decoders. Our framework achieves a new state-of-the-art of 58.25% mIoU on the ADE20K dataset and improvements of over 3% in the mIoU metric on the Cityscapes dataset. The code and checkpoints are publicly available at https://github.com/Picsart-AI-Research/SeMask-Segmentation .

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes valmIoU84.98SeMask (SeMask Swin-L Mask2Former)
Semantic SegmentationCityscapes valmIoU80.39SeMask (SeMask Swin-L FPN)
Semantic SegmentationADE20K valmIoU58.2SeMask (SeMask Swin-L FaPN-Mask2Former)
Semantic SegmentationADE20K valmIoU58.2SeMask (SeMask Swin-L MSFaPN-Mask2Former)
Semantic SegmentationADE20K valmIoU57.5SeMask (SeMask Swin-L Mask2Former)
Semantic SegmentationADE20K valmIoU57SeMask (SeMask Swin-L MSFaPN-Mask2Former, single-scale)
Semantic SegmentationADE20K valmIoU56.2SeMask (SeMask Swin-L MaskFormer)
Semantic SegmentationADE20K valmIoU53.5SeMask (SeMask Swin-L FPN)
Semantic SegmentationADE20KValidation mIoU58.2SeMask (SeMask Swin-L FaPN-Mask2Former)
Semantic SegmentationADE20KValidation mIoU58.2SeMask (SeMask Swin-L MSFaPN-Mask2Former)
Semantic SegmentationADE20KValidation mIoU57.5SeMask (SeMask Swin-L Mask2Former)
Semantic SegmentationADE20KValidation mIoU57SeMask(SeMask Swin-L MSFaPN-Mask2Former, single-scale)
Semantic SegmentationADE20KValidation mIoU56.2SeMask (SeMask Swin-L MaskFormer)
Semantic SegmentationADE20KValidation mIoU53.52SeMask (SeMask Swin-L FPN)
Semantic SegmentationADE20KParams (M)96SeMask (SeMask Swin-B FPN)
Semantic SegmentationADE20KValidation mIoU50.98SeMask (SeMask Swin-B FPN)
Semantic SegmentationADE20KParams (M)56SeMask (SeMask Swin-S FPN)
Semantic SegmentationADE20KValidation mIoU47.63SeMask (SeMask Swin-S FPN)
Semantic SegmentationADE20KParams (M)35SeMask (SeMask Swin-T FPN)
Semantic SegmentationADE20KValidation mIoU43.16SeMask (SeMask Swin-T FPN)
10-shot image generationCityscapes valmIoU84.98SeMask (SeMask Swin-L Mask2Former)
10-shot image generationCityscapes valmIoU80.39SeMask (SeMask Swin-L FPN)
10-shot image generationADE20K valmIoU58.2SeMask (SeMask Swin-L FaPN-Mask2Former)
10-shot image generationADE20K valmIoU58.2SeMask (SeMask Swin-L MSFaPN-Mask2Former)
10-shot image generationADE20K valmIoU57.5SeMask (SeMask Swin-L Mask2Former)
10-shot image generationADE20K valmIoU57SeMask (SeMask Swin-L MSFaPN-Mask2Former, single-scale)
10-shot image generationADE20K valmIoU56.2SeMask (SeMask Swin-L MaskFormer)
10-shot image generationADE20K valmIoU53.5SeMask (SeMask Swin-L FPN)
10-shot image generationADE20KValidation mIoU58.2SeMask (SeMask Swin-L FaPN-Mask2Former)
10-shot image generationADE20KValidation mIoU58.2SeMask (SeMask Swin-L MSFaPN-Mask2Former)
10-shot image generationADE20KValidation mIoU57.5SeMask (SeMask Swin-L Mask2Former)
10-shot image generationADE20KValidation mIoU57SeMask(SeMask Swin-L MSFaPN-Mask2Former, single-scale)
10-shot image generationADE20KValidation mIoU56.2SeMask (SeMask Swin-L MaskFormer)
10-shot image generationADE20KValidation mIoU53.52SeMask (SeMask Swin-L FPN)
10-shot image generationADE20KParams (M)96SeMask (SeMask Swin-B FPN)
10-shot image generationADE20KValidation mIoU50.98SeMask (SeMask Swin-B FPN)
10-shot image generationADE20KParams (M)56SeMask (SeMask Swin-S FPN)
10-shot image generationADE20KValidation mIoU47.63SeMask (SeMask Swin-S FPN)
10-shot image generationADE20KParams (M)35SeMask (SeMask Swin-T FPN)
10-shot image generationADE20KValidation mIoU43.16SeMask (SeMask Swin-T FPN)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15