TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Find First, Track Next: Decoupling Identification and Prop...

Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation

Suhwan Cho, Seunghoon Lee, Minhyeok Lee, Jungho Lee, Sangyoun Lee

2025-03-05Referring Video Object SegmentationSegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentation
PaperPDFCode(official)

Abstract

Referring video object segmentation aims to segment and track a target object in a video using a natural language prompt. Existing methods typically fuse visual and textual features in a highly entangled manner, processing multi-modal information together to generate per-frame masks. However, this approach often struggles with ambiguous target identification, particularly in scenes with multiple similar objects, and fails to ensure consistent mask propagation across frames. To address these limitations, we introduce FindTrack, a novel decoupled framework that separates target identification from mask propagation. FindTrack first adaptively selects a key frame by balancing segmentation confidence and vision-text alignment, establishing a robust reference for the target object. This reference is then utilized by a dedicated propagation module to track and segment the object across the entire video. By decoupling these processes, FindTrack effectively reduces ambiguities in target association and enhances segmentation consistency. We demonstrate that FindTrack outperforms existing methods on public benchmarks.

Results

TaskDatasetMetricValueModel
VideoMeViSF55.9FindTrack
VideoMeViSJ50.5FindTrack
VideoMeViSJ&F53.2FindTrack
VideoRefer-YouTube-VOSF75.7FindTrack
VideoRefer-YouTube-VOSJ71.8FindTrack
VideoRefer-YouTube-VOSJ&F73.7FindTrack
VideoRef-DAVIS17F78.5FindTrack
VideoRef-DAVIS17J69.9FindTrack
VideoRef-DAVIS17J&F74.2FindTrack
Video Object SegmentationMeViSF55.9FindTrack
Video Object SegmentationMeViSJ50.5FindTrack
Video Object SegmentationMeViSJ&F53.2FindTrack
Video Object SegmentationRefer-YouTube-VOSF75.7FindTrack
Video Object SegmentationRefer-YouTube-VOSJ71.8FindTrack
Video Object SegmentationRefer-YouTube-VOSJ&F73.7FindTrack
Video Object SegmentationRef-DAVIS17F78.5FindTrack
Video Object SegmentationRef-DAVIS17J69.9FindTrack
Video Object SegmentationRef-DAVIS17J&F74.2FindTrack

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17