Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond

Chen Shuai, Meng Fanman, Zhang Runtong, Qiu Heqian, Li Hongliang, Wu Qingbo, Xu Linfeng

2023-08-15Zero Shot Segmentation Segmentation Few-Shot Semantic Segmentation

Abstract

Few-shot segmentation (FSS) aims to segment the novel classes with a few annotated images. Due to CLIP's advantages of aligning visual and textual information, the integration of CLIP can enhance the generalization ability of FSS model. However, even with the CLIP model, the existing CLIP-based FSS methods are still subject to the biased prediction towards base classes, which is caused by the class-specific feature level interactions. To solve this issue, we propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net). It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity. Specifically, the class-relevant textual and visual features are first transformed to class-agnostic prior in the form of probability map. Then, a Prior-Guided Mask Assemble Module (PGMAM) including multiple General Assemble Units (GAUs) is introduced. It considers diverse and plug-and-play interactions, such as visual-textual, inter- and intra-image, training-free, and high-order ones. Lastly, to ensure the class-agnostic ability, a Hierarchical Decoder with Channel-Drop Mechanism (HDCDM) is proposed to flexibly exploit the assembled masks and low-level features, without relying on any class-specific information. It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $\text{PASCAL-}5^i$ and $59.4$ on $\text{COCO-}20^i$ in 1-shot scenario. Beyond this, we show that without extra re-training, the proposed PGMA-Net can solve bbox-level and cross-domain FSS, co-segmentation, zero-shot segmentation (ZSS) tasks, leading an any-shot segmentation framework.

Results

Task	Dataset	Metric	Value	Model
Few-Shot Learning	COCO-20i (5-shot)	FB-IoU	79.4	PGMA-Net (ResNet-101)
Few-Shot Learning	COCO-20i (5-shot)	Mean IoU	61.8	PGMA-Net (ResNet-101)
Few-Shot Learning	COCO-20i (5-shot)	FB-IoU	76.7	PGMA-Net (ResNet-50)
Few-Shot Learning	COCO-20i (5-shot)	Mean IoU	57.1	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (1-Shot)	FB-IoU	86.2	PGMA-Net (ResNet-101)
Few-Shot Learning	PASCAL-5i (1-Shot)	Mean IoU	77.6	PGMA-Net (ResNet-101)
Few-Shot Learning	PASCAL-5i (1-Shot)	FB-IoU	83.5	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (1-Shot)	FB-IoU	82.1	PGMA-Net (ViT-B/16)
Few-Shot Learning	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ViT-B/16)
Few-Shot Learning	COCO-20i (1-shot)	FB-IoU	78.5	PGMA-Net (ResNet-101)
Few-Shot Learning	COCO-20i (1-shot)	Mean IoU	59.4	PGMA-Net (ResNet-101)
Few-Shot Learning	COCO-20i (1-shot)	FB-IoU	75.8	PGMA-Net (ResNet-50)
Few-Shot Learning	COCO-20i (1-shot)	Mean IoU	54.3	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (5-Shot)	FB-IoU	86.9	PGMA-Net (ResNet-101)
Few-Shot Learning	PASCAL-5i (5-Shot)	Mean IoU	78.6	PGMA-Net (ResNet-101)
Few-Shot Learning	PASCAL-5i (5-Shot)	FB-IoU	84.2	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (5-Shot)	Mean IoU	75.2	PGMA-Net (ResNet-50)
Few-Shot Learning	PASCAL-5i (5-Shot)	FB-IoU	82.5	PGMA-Net (ViT-B/16)
Few-Shot Learning	PASCAL-5i (5-Shot)	Mean IoU	74.6	PGMA-Net (ViT-B/16)
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	FB-IoU	79.4	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	Mean IoU	61.8	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	FB-IoU	76.7	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	Mean IoU	57.1	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	FB-IoU	86.2	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	Mean IoU	77.6	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	FB-IoU	83.5	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	FB-IoU	82.1	PGMA-Net (ViT-B/16)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ViT-B/16)
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	FB-IoU	78.5	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	Mean IoU	59.4	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	FB-IoU	75.8	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	Mean IoU	54.3	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	FB-IoU	86.9	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	Mean IoU	78.6	PGMA-Net (ResNet-101)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	FB-IoU	84.2	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	Mean IoU	75.2	PGMA-Net (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	FB-IoU	82.5	PGMA-Net (ViT-B/16)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	Mean IoU	74.6	PGMA-Net (ViT-B/16)
Meta-Learning	COCO-20i (5-shot)	FB-IoU	79.4	PGMA-Net (ResNet-101)
Meta-Learning	COCO-20i (5-shot)	Mean IoU	61.8	PGMA-Net (ResNet-101)
Meta-Learning	COCO-20i (5-shot)	FB-IoU	76.7	PGMA-Net (ResNet-50)
Meta-Learning	COCO-20i (5-shot)	Mean IoU	57.1	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (1-Shot)	FB-IoU	86.2	PGMA-Net (ResNet-101)
Meta-Learning	PASCAL-5i (1-Shot)	Mean IoU	77.6	PGMA-Net (ResNet-101)
Meta-Learning	PASCAL-5i (1-Shot)	FB-IoU	83.5	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (1-Shot)	FB-IoU	82.1	PGMA-Net (ViT-B/16)
Meta-Learning	PASCAL-5i (1-Shot)	Mean IoU	74.1	PGMA-Net (ViT-B/16)
Meta-Learning	COCO-20i (1-shot)	FB-IoU	78.5	PGMA-Net (ResNet-101)
Meta-Learning	COCO-20i (1-shot)	Mean IoU	59.4	PGMA-Net (ResNet-101)
Meta-Learning	COCO-20i (1-shot)	FB-IoU	75.8	PGMA-Net (ResNet-50)
Meta-Learning	COCO-20i (1-shot)	Mean IoU	54.3	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (5-Shot)	FB-IoU	86.9	PGMA-Net (ResNet-101)
Meta-Learning	PASCAL-5i (5-Shot)	Mean IoU	78.6	PGMA-Net (ResNet-101)
Meta-Learning	PASCAL-5i (5-Shot)	FB-IoU	84.2	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (5-Shot)	Mean IoU	75.2	PGMA-Net (ResNet-50)
Meta-Learning	PASCAL-5i (5-Shot)	FB-IoU	82.5	PGMA-Net (ViT-B/16)
Meta-Learning	PASCAL-5i (5-Shot)	Mean IoU	74.6	PGMA-Net (ViT-B/16)

Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond

Abstract

Results

Related Papers

Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond

Abstract

Results

Related Papers