TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Representation Separation for Semantic Segmentation with V...

Representation Separation for Semantic Segmentation with Vision Transformers

Yuanduo Hong, Huihui Pan, Weichao Sun, Xinghu Yu, Huijun Gao

2022-12-28Semantic Segmentation
PaperPDF

Abstract

Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.We present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs. It is targeted for the peculiar over-smoothness of ViTs in semantic segmentation, and therefore differs from current popular paradigms of context modeling and most existing related methods reinforcing the advantage of attention. We first deliver the decoupled two-pathway network in which another pathway enhances and passes down local-patch discrepancy complementary to global representations of transformers. We then propose the spatially adaptive separation module to obtain more separate deep representations and the discriminative cross-attention which yields more discriminative region representations through novel auxiliary supervisions. The proposed methods achieve some impressive results: 1) incorporated with large-scale plain ViTs, our methods achieve new state-of-the-art performances on five widely used benchmarks; 2) using masked pre-trained plain ViTs, we achieve 68.9% mIoU on Pascal Context, setting a new record; 3) pyramid ViTs integrated with the decoupled two-pathway network even surpass the well-designed high-resolution ViTs on Cityscapes; 4) the improved representations by our framework have favorable transferability in images with natural corruptions. The codes will be released publicly.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20K valmIoU58.4RSSeg-ViT-L(BEiT pretrain)
Semantic SegmentationPASCAL ContextmIoU68.9RSSeg-ViT-L (BEiT pretrain)
Semantic SegmentationPASCAL ContextmIoU67.5RSSeg-ViT-L
Semantic SegmentationADE20KParams (M)330RSSeg-ViT-L (BEiT pretrain)
Semantic SegmentationADE20KValidation mIoU58.4RSSeg-ViT-L (BEiT pretrain)
10-shot image generationADE20K valmIoU58.4RSSeg-ViT-L(BEiT pretrain)
10-shot image generationPASCAL ContextmIoU68.9RSSeg-ViT-L (BEiT pretrain)
10-shot image generationPASCAL ContextmIoU67.5RSSeg-ViT-L
10-shot image generationADE20KParams (M)330RSSeg-ViT-L (BEiT pretrain)
10-shot image generationADE20KValidation mIoU58.4RSSeg-ViT-L (BEiT pretrain)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15