TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SegViTv2: Exploring Efficient and Continual Semantic Segme...

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

2023-06-09Continual LearningContinual Semantic SegmentationSegmentationSemantic Segmentation
PaperPDFCode(official)

Abstract

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}. In this study, we introduce a novel Attention-to-Mask (\atm) module to design a lightweight decoder effective for plain ViT. The proposed ATM converts the global attention map into semantic masks for high-quality segmentation results. Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5\%$ of the computational cost. For the encoder, we address the concern of the relatively high computational cost in the ViT-based encoders and propose a \emph{Shrunk++} structure that incorporates edge-aware query-based down-sampling (EQD) and query-based upsampling (QU) modules. The Shrunk++ structure reduces the computational cost of the encoder by up to $50\%$ while maintaining competitive performance. Furthermore, we propose to adapt SegViT for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge. Experiments show that our proposed SegViTv2 surpasses recent segmentation methods on three popular benchmarks including ADE20k, COCO-Stuff-10k and PASCAL-Context datasets. The code is available through the following link: \url{https://github.com/zbwxp/SegVit}.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KGFLOPs (512 x 512)637.9SegViT-v2 (BEiT-v2-Large)
Semantic SegmentationADE20KValidation mIoU58.2SegViT-v2 (BEiT-v2-Large)
10-shot image generationADE20KGFLOPs (512 x 512)637.9SegViT-v2 (BEiT-v2-Large)
10-shot image generationADE20KValidation mIoU58.2SegViT-v2 (BEiT-v2-Large)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17