Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei

2024-10-09Few-Shot Semantic Segmentation Semantic Segmentation

Abstract

The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in https://andyzaq.github.io/GF-SAM/.

Results

Task	Dataset	Metric	Value	Model
Few-Shot Learning	FSS-1000 (5-shot)	Mean IoU	88.9	GF-SAM (DINOv2)
Few-Shot Learning	COCO-20i (5-shot)	Mean IoU	66.8	GF-SAM (DINOv2)
Few-Shot Learning	FSS-1000 (1-shot)	Mean IoU	88	GF-SAM (DINOv2)
Few-Shot Learning	PASCAL-5i (1-Shot)	Mean IoU	72.1	GF-SAM (DINOv2)
Few-Shot Learning	COCO-20i (1-shot)	Mean IoU	58.7	GF-SAM (DINOv2)
Few-Shot Learning	PASCAL-5i (5-Shot)	Mean IoU	82.6	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	Mean IoU	88.9	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	Mean IoU	66.8	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	Mean IoU	88	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	Mean IoU	72.1	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	Mean IoU	58.7	GF-SAM (DINOv2)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	Mean IoU	82.6	GF-SAM (DINOv2)
Meta-Learning	FSS-1000 (5-shot)	Mean IoU	88.9	GF-SAM (DINOv2)
Meta-Learning	COCO-20i (5-shot)	Mean IoU	66.8	GF-SAM (DINOv2)
Meta-Learning	FSS-1000 (1-shot)	Mean IoU	88	GF-SAM (DINOv2)
Meta-Learning	PASCAL-5i (1-Shot)	Mean IoU	72.1	GF-SAM (DINOv2)
Meta-Learning	COCO-20i (1-shot)	Mean IoU	58.7	GF-SAM (DINOv2)
Meta-Learning	PASCAL-5i (5-Shot)	Mean IoU	82.6	GF-SAM (DINOv2)

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Abstract

Results

Related Papers

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Abstract

Results

Related Papers