No time to train! Training-Free Reference-Based Instance Segmentation

Miguel Espinosa, Chenhongyi Yang, Linus Ericsson, Steven McDonagh, Elliot J. Crowley

2025-07-03Few-Shot Object Detection Segmentation Semantic Segmentation Instance Segmentation Cross-Domain Few-Shot Object Detection Image Segmentation

Paper PDF Code Code(official)

Abstract

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

Results

Task	Dataset	Metric	Value	Model
Object Detection	MS-COCO (1-shot)	AP	26.5	Training-free
Object Detection	MS-COCO (30-shot)	AP	36.8	Training-free
Object Detection	MS-COCO (10-shot)	AP	36.6	Training-free
Object Detection	Artaxor	mAP	35	Training-free(w/o FT)
Object Detection	NEU-DET	mAP	5.5	Training-free(w/o FT)
Object Detection	DIOR	mAP	16.4	Training-free(w/o FT)
Object Detection	Clipark1k	mAP	25.9	Training-free(w/o FT)
Object Detection	DeepFish	mAP	29.6	Training-free(w/o FT)
Object Detection	UODD	mAP	16	Training-free(w/o FT)
3D	MS-COCO (1-shot)	AP	26.5	Training-free
3D	MS-COCO (30-shot)	AP	36.8	Training-free
3D	MS-COCO (10-shot)	AP	36.6	Training-free
3D	Artaxor	mAP	35	Training-free(w/o FT)
3D	NEU-DET	mAP	5.5	Training-free(w/o FT)
3D	DIOR	mAP	16.4	Training-free(w/o FT)
3D	Clipark1k	mAP	25.9	Training-free(w/o FT)
3D	DeepFish	mAP	29.6	Training-free(w/o FT)
3D	UODD	mAP	16	Training-free(w/o FT)
Few-Shot Object Detection	MS-COCO (1-shot)	AP	26.5	Training-free
Few-Shot Object Detection	MS-COCO (30-shot)	AP	36.8	Training-free
Few-Shot Object Detection	MS-COCO (10-shot)	AP	36.6	Training-free
Few-Shot Object Detection	Artaxor	mAP	35	Training-free(w/o FT)
Few-Shot Object Detection	NEU-DET	mAP	5.5	Training-free(w/o FT)
Few-Shot Object Detection	DIOR	mAP	16.4	Training-free(w/o FT)
Few-Shot Object Detection	Clipark1k	mAP	25.9	Training-free(w/o FT)
Few-Shot Object Detection	DeepFish	mAP	29.6	Training-free(w/o FT)
Few-Shot Object Detection	UODD	mAP	16	Training-free(w/o FT)
2D Classification	MS-COCO (1-shot)	AP	26.5	Training-free
2D Classification	MS-COCO (30-shot)	AP	36.8	Training-free
2D Classification	MS-COCO (10-shot)	AP	36.6	Training-free
2D Classification	Artaxor	mAP	35	Training-free(w/o FT)
2D Classification	NEU-DET	mAP	5.5	Training-free(w/o FT)
2D Classification	DIOR	mAP	16.4	Training-free(w/o FT)
2D Classification	Clipark1k	mAP	25.9	Training-free(w/o FT)
2D Classification	DeepFish	mAP	29.6	Training-free(w/o FT)
2D Classification	UODD	mAP	16	Training-free(w/o FT)
2D Object Detection	MS-COCO (1-shot)	AP	26.5	Training-free
2D Object Detection	MS-COCO (30-shot)	AP	36.8	Training-free
2D Object Detection	MS-COCO (10-shot)	AP	36.6	Training-free
2D Object Detection	Artaxor	mAP	35	Training-free(w/o FT)
2D Object Detection	NEU-DET	mAP	5.5	Training-free(w/o FT)
2D Object Detection	DIOR	mAP	16.4	Training-free(w/o FT)
2D Object Detection	Clipark1k	mAP	25.9	Training-free(w/o FT)
2D Object Detection	DeepFish	mAP	29.6	Training-free(w/o FT)
2D Object Detection	UODD	mAP	16	Training-free(w/o FT)
16k	MS-COCO (1-shot)	AP	26.5	Training-free
16k	MS-COCO (30-shot)	AP	36.8	Training-free
16k	MS-COCO (10-shot)	AP	36.6	Training-free
16k	Artaxor	mAP	35	Training-free(w/o FT)
16k	NEU-DET	mAP	5.5	Training-free(w/o FT)
16k	DIOR	mAP	16.4	Training-free(w/o FT)
16k	Clipark1k	mAP	25.9	Training-free(w/o FT)
16k	DeepFish	mAP	29.6	Training-free(w/o FT)
16k	UODD	mAP	16	Training-free(w/o FT)

No time to train! Training-Free Reference-Based Instance Segmentation

Abstract

Results

Related Papers

No time to train! Training-Free Reference-Based Instance Segmentation

Abstract

Results

Related Papers