TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Simple Baseline for Open-Vocabulary Semantic Segmentatio...

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

2021-12-29Open Vocabulary Image ClassificationZero-Shot Semantic SegmentationImage ClassificationZero-Shot Image ClassificationOpen Vocabulary Semantic SegmentationSegmentationSemantic SegmentationOpen-Vocabulary Semantic Segmentationobject-detectionZero-Shot LearningObject DetectionLanguage Modelling
PaperPDFCodeCode(official)

Abstract

Recently, open-vocabulary image classification by vision language pre-training has demonstrated incredible achievements, that the model can classify arbitrary categories without seeing additional annotated images of that category. However, it is still unclear how to make the open-vocabulary recognition work well on broader vision problems. This paper targets open-vocabulary semantic segmentation by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images. To remedy the discrepancy in processing granularity, we refuse the use of the prevalent one-stage FCN based framework, and advocate a two-stage semantic segmentation framework, with the first stage extracting generalizable mask proposals and the second stage leveraging an image based CLIP model to perform open-vocabulary classification on the masked image crops which are generated in the first stage. Our experimental results show that this two-stage framework can achieve superior performance than FCN when trained only on COCO Stuff dataset and evaluated on other datasets without fine-tuning. Moreover, this simple framework also surpasses previous state-of-the-arts of zero-shot semantic segmentation by a large margin: +29.5 hIoU on the Pascal VOC 2012 dataset, and +8.9 hIoU on the COCO Stuff dataset. With its simplicity and strong performance, we hope this framework to serve as a baseline to facilitate future research. The code are made publicly available at~\url{https://github.com/MendelXu/zsseg.baseline}.

Results

TaskDatasetMetricValueModel
Open Vocabulary Semantic SegmentationCOCO-Stuff-171HIoU37.8ZSSeg
Open Vocabulary Semantic SegmentationADE20K-847mIoU7SimSeg
Open Vocabulary Semantic SegmentationCityscapesmIoU34.5SimSeg
Open Vocabulary Semantic SegmentationPascalVOC-20hIoU77.5ZSSeg
Open Vocabulary Semantic SegmentationPASCAL Context-59mIoU47.7SimSeg
Open Vocabulary Semantic SegmentationADE20K-150mIoU20.5SimSeg
Zero-Shot Semantic SegmentationPASCAL VOCInductive Setting hIoU77.5zsseg
Zero-Shot Semantic SegmentationPASCAL VOCTransductive Setting hIoU79.3zsseg
Zero-Shot Semantic SegmentationCOCO-StuffInductive Setting hIoU36.3zsseg
Zero-Shot Semantic SegmentationCOCO-StuffTransductive Setting hIoU41.5zsseg

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17