TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporally Consistent Referring Video Object Segmentation ...

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian

2024-03-28Referring Video Object SegmentationReferring Expression SegmentationSegmentationSemantic SegmentationVideo SegmentationVideo Object SegmentationVideo Semantic SegmentationHTR
PaperPDFCode(official)

Abstract

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR.

Results

TaskDatasetMetricValueModel
VideoMeViSF45.5HTR
VideoMeViSJ39.9HTR
VideoMeViSJ&F42.7HTR
VideoRefer-YouTube-VOSF68.9HTR
VideoRefer-YouTube-VOSJ65.3HTR
VideoRefer-YouTube-VOSJ&F67.1HTR
Instance SegmentationRefer-YouTube-VOS (2021 public validation)F68.9HTR (Pre-training)
Instance SegmentationRefer-YouTube-VOS (2021 public validation)J65.3HTR (Pre-training)
Instance SegmentationRefer-YouTube-VOS (2021 public validation)J&F67.1HTR (Pre-training)
Instance SegmentationDAVIS 2017 (val)J&F 1st frame65.6HTR
Video Object SegmentationMeViSF45.5HTR
Video Object SegmentationMeViSJ39.9HTR
Video Object SegmentationMeViSJ&F42.7HTR
Video Object SegmentationRefer-YouTube-VOSF68.9HTR
Video Object SegmentationRefer-YouTube-VOSJ65.3HTR
Video Object SegmentationRefer-YouTube-VOSJ&F67.1HTR
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)F68.9HTR (Pre-training)
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)J65.3HTR (Pre-training)
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)J&F67.1HTR (Pre-training)
Referring Expression SegmentationDAVIS 2017 (val)J&F 1st frame65.6HTR

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17