TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Seman...

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Kwanyoung Kim, Yujin Oh, Jong Chul Ye

2024-03-21Zero-Shot Semantic SegmentationSemantic Segmentation
PaperPDFCode(official)

Abstract

The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed at enhancing the potential of multiple text prompts for matching associated pixel embeddings. We first propose Multi-Prompts Sinkhorn (MPS) based on the Optimal Transport (OT) algorithm, which leads multiple text prompts to selectively focus on various semantic features within image pixels. Moreover, inspired by the success of Sinkformers in unimodal settings, we introduce the extension of MPS, called Multi-Prompts Sinkhorn Attention (MPSA) , which effectively replaces cross-attention mechanisms within Transformer framework in multimodal settings. Through extensive experiments, we demonstrate that OTSeg achieves state-of-the-art (SOTA) performance with significant gains on Zero-Shot Semantic Segmentation (ZS3) tasks across three benchmark datasets.

Results

TaskDatasetMetricValueModel
Zero-Shot Semantic SegmentationPASCAL VOCInductive Setting hIoU87.4OTSeg+
Zero-Shot Semantic SegmentationPASCAL VOCTransductive Setting hIoU94.4OTSeg+
Zero-Shot Semantic SegmentationPASCAL VOCInductive Setting hIoU84.5OTSeg
Zero-Shot Semantic SegmentationPASCAL VOCTransductive Setting hIoU94.2OTSeg
Zero-Shot Semantic SegmentationCOCO-StuffInductive Setting hIoU41.5OTSeg+
Zero-Shot Semantic SegmentationCOCO-StuffTransductive Setting hIoU49.8OTSeg+
Zero-Shot Semantic SegmentationCOCO-StuffInductive Setting hIoU41.4OTSeg
Zero-Shot Semantic SegmentationCOCO-StuffTransductive Setting hIoU49.5OTSeg

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15