TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Visual and Textual Prior Guided Mask Assemble for Few-Shot...

Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond

Chen Shuai, Meng Fanman, Zhang Runtong, Qiu Heqian, Li Hongliang, Wu Qingbo, Xu Linfeng

2023-08-15Zero Shot SegmentationSegmentationFew-Shot Semantic Segmentation
PaperPDF

Abstract

Few-shot segmentation (FSS) aims to segment the novel classes with a few annotated images. Due to CLIP's advantages of aligning visual and textual information, the integration of CLIP can enhance the generalization ability of FSS model. However, even with the CLIP model, the existing CLIP-based FSS methods are still subject to the biased prediction towards base classes, which is caused by the class-specific feature level interactions. To solve this issue, we propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net). It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity. Specifically, the class-relevant textual and visual features are first transformed to class-agnostic prior in the form of probability map. Then, a Prior-Guided Mask Assemble Module (PGMAM) including multiple General Assemble Units (GAUs) is introduced. It considers diverse and plug-and-play interactions, such as visual-textual, inter- and intra-image, training-free, and high-order ones. Lastly, to ensure the class-agnostic ability, a Hierarchical Decoder with Channel-Drop Mechanism (HDCDM) is proposed to flexibly exploit the assembled masks and low-level features, without relying on any class-specific information. It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $\text{PASCAL-}5^i$ and $59.4$ on $\text{COCO-}20^i$ in 1-shot scenario. Beyond this, we show that without extra re-training, the proposed PGMA-Net can solve bbox-level and cross-domain FSS, co-segmentation, zero-shot segmentation (ZSS) tasks, leading an any-shot segmentation framework.

Results

TaskDatasetMetricValueModel
Few-Shot LearningCOCO-20i (5-shot)FB-IoU79.4PGMA-Net (ResNet-101)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU61.8PGMA-Net (ResNet-101)
Few-Shot LearningCOCO-20i (5-shot)FB-IoU76.7PGMA-Net (ResNet-50)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU57.1PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU86.2PGMA-Net (ResNet-101)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU77.6PGMA-Net (ResNet-101)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU83.5PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU82.1PGMA-Net (ViT-B/16)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ViT-B/16)
Few-Shot LearningCOCO-20i (1-shot)FB-IoU78.5PGMA-Net (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU59.4PGMA-Net (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)FB-IoU75.8PGMA-Net (ResNet-50)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU54.3PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU86.9PGMA-Net (ResNet-101)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU78.6PGMA-Net (ResNet-101)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU84.2PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU75.2PGMA-Net (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU82.5PGMA-Net (ViT-B/16)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU74.6PGMA-Net (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)FB-IoU79.4PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU61.8PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)FB-IoU76.7PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU57.1PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU86.2PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU77.6PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU83.5PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU82.1PGMA-Net (ViT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)FB-IoU78.5PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU59.4PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)FB-IoU75.8PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU54.3PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU86.9PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU78.6PGMA-Net (ResNet-101)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU84.2PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU75.2PGMA-Net (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU82.5PGMA-Net (ViT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU74.6PGMA-Net (ViT-B/16)
Meta-LearningCOCO-20i (5-shot)FB-IoU79.4PGMA-Net (ResNet-101)
Meta-LearningCOCO-20i (5-shot)Mean IoU61.8PGMA-Net (ResNet-101)
Meta-LearningCOCO-20i (5-shot)FB-IoU76.7PGMA-Net (ResNet-50)
Meta-LearningCOCO-20i (5-shot)Mean IoU57.1PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU86.2PGMA-Net (ResNet-101)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU77.6PGMA-Net (ResNet-101)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU83.5PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU82.1PGMA-Net (ViT-B/16)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU74.1PGMA-Net (ViT-B/16)
Meta-LearningCOCO-20i (1-shot)FB-IoU78.5PGMA-Net (ResNet-101)
Meta-LearningCOCO-20i (1-shot)Mean IoU59.4PGMA-Net (ResNet-101)
Meta-LearningCOCO-20i (1-shot)FB-IoU75.8PGMA-Net (ResNet-50)
Meta-LearningCOCO-20i (1-shot)Mean IoU54.3PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU86.9PGMA-Net (ResNet-101)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU78.6PGMA-Net (ResNet-101)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU84.2PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU75.2PGMA-Net (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU82.5PGMA-Net (ViT-B/16)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU74.6PGMA-Net (ViT-B/16)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17