TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Relax Image-Specific Prompt Requirement in SAM: A Single G...

Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

Jian Hu, Jiayi Lin, Weitong Cai, Shaogang Gong

2023-12-12Test-time Adaptationobject-detectionCamouflaged Object Segmentation with a Single Task-generic PromptObject Detection
PaperPDFCode(official)

Abstract

Camouflaged object detection (COD) approaches heavily rely on pixel-level annotated datasets. Weakly-supervised COD (WSCOD) approaches use sparse annotations like scribbles or points to reduce annotation effort, but this can lead to decreased accuracy. The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasible, as it may not be accessible in real-world application. Additionally, it only provides localization information instead of semantic one, which can intrinsically cause ambiguity in interpreting the targets. In this work, we aim to eliminate the need for manual prompt. The key idea is to employ Cross-modal Chains of Thought Prompting (CCTP) to reason visual prompts using the semantic information given by a generic text prompt. To that end, we introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts the generic task prompt for WSCOD. In particular, CCTP maps a single generic text prompt onto image-specific consensus foreground and background heatmaps using vision-language models, acquiring reliable visual prompts. Moreover, to test-time adapt the visual prompts, we further propose Progressive Mask Generation (PMG) to iteratively reweight the input image, guiding the model to focus on the targets in a coarse-to-fine manner. Crucially, all network parameters are fixed, avoiding the need for additional training. Experiments demonstrate the superiority of GenSAM. Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches and achieves comparable results to scribble supervision ones, solely relying on general task descriptions as prompts. our codes is in: https://lwpyh.github.io/GenSAM/.

Results

TaskDatasetMetricValueModel
Object DetectionCAMOE_{\phi}0.775GenSAM
Object DetectionCAMOF_{\beta}0.659GenSAM
Object DetectionCAMOMAE0.113GenSAM
Object DetectionCAMOS_{\alpha}0.719GenSAM
Object DetectionCOD10KE_{\phi}0.838GenSAM
Object DetectionCOD10KF_{\beta}0.681GenSAM
Object DetectionCOD10KMAE0.067GenSAM
Object DetectionCOD10KS_{\alpha}0.775GenSAM
Object DetectionChameleonE_{\phi}0.807GenSAM
Object DetectionChameleonF_{\beta}0.68GenSAM
Object DetectionChameleonMAE0.09GenSAM
Object DetectionChameleonS_{\alpha}0.764GenSAM
3DCAMOE_{\phi}0.775GenSAM
3DCAMOF_{\beta}0.659GenSAM
3DCAMOMAE0.113GenSAM
3DCAMOS_{\alpha}0.719GenSAM
3DCOD10KE_{\phi}0.838GenSAM
3DCOD10KF_{\beta}0.681GenSAM
3DCOD10KMAE0.067GenSAM
3DCOD10KS_{\alpha}0.775GenSAM
3DChameleonE_{\phi}0.807GenSAM
3DChameleonF_{\beta}0.68GenSAM
3DChameleonMAE0.09GenSAM
3DChameleonS_{\alpha}0.764GenSAM
Camouflaged Object SegmentationCAMOE_{\phi}0.775GenSAM
Camouflaged Object SegmentationCAMOF_{\beta}0.659GenSAM
Camouflaged Object SegmentationCAMOMAE0.113GenSAM
Camouflaged Object SegmentationCAMOS_{\alpha}0.719GenSAM
Camouflaged Object SegmentationCOD10KE_{\phi}0.838GenSAM
Camouflaged Object SegmentationCOD10KF_{\beta}0.681GenSAM
Camouflaged Object SegmentationCOD10KMAE0.067GenSAM
Camouflaged Object SegmentationCOD10KS_{\alpha}0.775GenSAM
Camouflaged Object SegmentationChameleonE_{\phi}0.807GenSAM
Camouflaged Object SegmentationChameleonF_{\beta}0.68GenSAM
Camouflaged Object SegmentationChameleonMAE0.09GenSAM
Camouflaged Object SegmentationChameleonS_{\alpha}0.764GenSAM
Object SegmentationCAMOE_{\phi}0.775GenSAM
Object SegmentationCAMOF_{\beta}0.659GenSAM
Object SegmentationCAMOMAE0.113GenSAM
Object SegmentationCAMOS_{\alpha}0.719GenSAM
Object SegmentationCOD10KE_{\phi}0.838GenSAM
Object SegmentationCOD10KF_{\beta}0.681GenSAM
Object SegmentationCOD10KMAE0.067GenSAM
Object SegmentationCOD10KS_{\alpha}0.775GenSAM
Object SegmentationChameleonE_{\phi}0.807GenSAM
Object SegmentationChameleonF_{\beta}0.68GenSAM
Object SegmentationChameleonMAE0.09GenSAM
Object SegmentationChameleonS_{\alpha}0.764GenSAM
2D ClassificationCAMOE_{\phi}0.775GenSAM
2D ClassificationCAMOF_{\beta}0.659GenSAM
2D ClassificationCAMOMAE0.113GenSAM
2D ClassificationCAMOS_{\alpha}0.719GenSAM
2D ClassificationCOD10KE_{\phi}0.838GenSAM
2D ClassificationCOD10KF_{\beta}0.681GenSAM
2D ClassificationCOD10KMAE0.067GenSAM
2D ClassificationCOD10KS_{\alpha}0.775GenSAM
2D ClassificationChameleonE_{\phi}0.807GenSAM
2D ClassificationChameleonF_{\beta}0.68GenSAM
2D ClassificationChameleonMAE0.09GenSAM
2D ClassificationChameleonS_{\alpha}0.764GenSAM
2D Object DetectionCAMOE_{\phi}0.775GenSAM
2D Object DetectionCAMOF_{\beta}0.659GenSAM
2D Object DetectionCAMOMAE0.113GenSAM
2D Object DetectionCAMOS_{\alpha}0.719GenSAM
2D Object DetectionCOD10KE_{\phi}0.838GenSAM
2D Object DetectionCOD10KF_{\beta}0.681GenSAM
2D Object DetectionCOD10KMAE0.067GenSAM
2D Object DetectionCOD10KS_{\alpha}0.775GenSAM
2D Object DetectionChameleonE_{\phi}0.807GenSAM
2D Object DetectionChameleonF_{\beta}0.68GenSAM
2D Object DetectionChameleonMAE0.09GenSAM
2D Object DetectionChameleonS_{\alpha}0.764GenSAM
16kCAMOE_{\phi}0.775GenSAM
16kCAMOF_{\beta}0.659GenSAM
16kCAMOMAE0.113GenSAM
16kCAMOS_{\alpha}0.719GenSAM
16kCOD10KE_{\phi}0.838GenSAM
16kCOD10KF_{\beta}0.681GenSAM
16kCOD10KMAE0.067GenSAM
16kCOD10KS_{\alpha}0.775GenSAM
16kChameleonE_{\phi}0.807GenSAM
16kChameleonF_{\beta}0.68GenSAM
16kChameleonMAE0.09GenSAM
16kChameleonS_{\alpha}0.764GenSAM

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07