Side Adapter Network for Open-Vocabulary Semantic Segmentation

Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

2023-02-23CVPR 2023 1Open Vocabulary Semantic Segmentation Zero Shot Segmentation Segmentation Semantic Segmentation Open-Vocabulary Semantic Segmentation Language Modelling

Paper PDF Code(official)Code Code

Abstract

This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks. This decoupled design has the benefit CLIP in recognizing the class of mask proposals. Since the attached side network can reuse CLIP features, it can be very light. In addition, the entire network can be trained end-to-end, allowing the side network to be adapted to the frozen CLIP model, which makes the predicted mask proposals CLIP-aware. Our approach is fast, accurate, and only adds a few additional trainable parameters. We evaluate our approach on multiple semantic segmentation benchmarks. Our method significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed. We hope our approach will serve as a solid baseline and help ease future research in open-vocabulary semantic segmentation. The code will be available at https://github.com/MendelXu/SAN.

Results

Task	Dataset	Metric	Value	Model
Zero Shot Segmentation	Segmentation in the Wild	Mean AP	41.4	SAN
Open Vocabulary Semantic Segmentation	ADE20K-847	mIoU	13.7	SAN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17