TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts ...

CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection

Xin Zhang, Keren Fu, Qijun Zhao

2025-04-01Camouflaged Object SegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentationobject-detectionObject Detection
PaperPDF

Abstract

The Segment Anything Model 2 (SAM2), a prompt-guided video foundation model, has remarkably performed in video object segmentation, drawing significant attention in the community. Due to the high similarity between camouflaged objects and their surroundings, which makes them difficult to distinguish even by the human eye, the application of SAM2 for automated segmentation in real-world scenarios faces challenges in camouflage perception and reliable prompts generation. To address these issues, we propose CamoSAM2, a motion-appearance prompt inducer (MAPI) and refinement framework to automatically generate and refine prompts for SAM2, enabling high-quality automatic detection and segmentation in VCOD task. Initially, we introduce a prompt inducer that simultaneously integrates motion and appearance cues to detect camouflaged objects, delivering more accurate initial predictions than existing methods. Subsequently, we propose a video-based adaptive multi-prompts refinement (AMPR) strategy tailored for SAM2, aimed at mitigating prompt error in initial coarse masks and further producing good prompts. Specifically, we introduce a novel three-step process to generate reliable prompts by camouflaged object determination, pivotal prompting frame selection, and multi-prompts formation. Extensive experiments conducted on two benchmark datasets demonstrate that our proposed model, CamoSAM2, significantly outperforms existing state-of-the-art methods, achieving increases of 8.0% and 10.1% in mIoU metric. Additionally, our method achieves the fastest inference speed compared to current VCOD models.

Results

TaskDatasetMetricValueModel
Object DetectionMoCA-MaskMAE0.007CamoSAM2
Object DetectionMoCA-MaskS-measure0.765CamoSAM2
Object DetectionMoCA-MaskmDice0.62CamoSAM2
Object DetectionMoCA-MaskmIoU0.542CamoSAM2
Object DetectionMoCA-Maskweighted F-measure0.607CamoSAM2
3DMoCA-MaskMAE0.007CamoSAM2
3DMoCA-MaskS-measure0.765CamoSAM2
3DMoCA-MaskmDice0.62CamoSAM2
3DMoCA-MaskmIoU0.542CamoSAM2
3DMoCA-Maskweighted F-measure0.607CamoSAM2
Camouflaged Object SegmentationMoCA-MaskMAE0.007CamoSAM2
Camouflaged Object SegmentationMoCA-MaskS-measure0.765CamoSAM2
Camouflaged Object SegmentationMoCA-MaskmDice0.62CamoSAM2
Camouflaged Object SegmentationMoCA-MaskmIoU0.542CamoSAM2
Camouflaged Object SegmentationMoCA-Maskweighted F-measure0.607CamoSAM2
Object SegmentationMoCA-MaskMAE0.007CamoSAM2
Object SegmentationMoCA-MaskS-measure0.765CamoSAM2
Object SegmentationMoCA-MaskmDice0.62CamoSAM2
Object SegmentationMoCA-MaskmIoU0.542CamoSAM2
Object SegmentationMoCA-Maskweighted F-measure0.607CamoSAM2
2D ClassificationMoCA-MaskMAE0.007CamoSAM2
2D ClassificationMoCA-MaskS-measure0.765CamoSAM2
2D ClassificationMoCA-MaskmDice0.62CamoSAM2
2D ClassificationMoCA-MaskmIoU0.542CamoSAM2
2D ClassificationMoCA-Maskweighted F-measure0.607CamoSAM2
2D Object DetectionMoCA-MaskMAE0.007CamoSAM2
2D Object DetectionMoCA-MaskS-measure0.765CamoSAM2
2D Object DetectionMoCA-MaskmDice0.62CamoSAM2
2D Object DetectionMoCA-MaskmIoU0.542CamoSAM2
2D Object DetectionMoCA-Maskweighted F-measure0.607CamoSAM2
16kMoCA-MaskMAE0.007CamoSAM2
16kMoCA-MaskS-measure0.765CamoSAM2
16kMoCA-MaskmDice0.62CamoSAM2
16kMoCA-MaskmIoU0.542CamoSAM2
16kMoCA-Maskweighted F-measure0.607CamoSAM2

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17