CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

2023-03-21CVPR 2024 1Open Vocabulary Semantic Segmentation Segmentation text similarity Semantic Segmentation Open-Vocabulary Semantic Segmentation Image Segmentation

Paper PDF Code Code(official)Code

Abstract

Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions. In this work, we introduce a novel cost-based approach to adapt vision-language foundation models, notably CLIP, for the intricate task of semantic segmentation. Through aggregating the cosine similarity score, i.e., the cost volume between image and text embeddings, our method potently adapts CLIP for segmenting seen and unseen classes by fine-tuning its encoders, addressing the challenges faced by existing methods in handling unseen classes. Building upon this, we explore methods to effectively aggregate the cost volume considering its multi-modal nature of being established between image and text embeddings. Furthermore, we examine various methods for efficiently fine-tuning CLIP.

Results

Task	Dataset	Metric	Value	Model
Open Vocabulary Semantic Segmentation	ADE20K-847	mIoU	16	CAT-Seg
Open Vocabulary Semantic Segmentation	PascalVOC-20b	mIoU	82.5	CAT-Seg
Open Vocabulary Semantic Segmentation	PASCAL Context-459	mIoU	23.8	CAT-Seg
Open Vocabulary Semantic Segmentation	PascalVOC-20	mIoU	97	CAT-Seg
Open Vocabulary Semantic Segmentation	PASCAL Context-59	mIoU	63.3	CAT-Seg
Open Vocabulary Semantic Segmentation	ADE20K-150	mIoU	37.9	CAT-Seg

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17