TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Modular Interactive Video Object Segmentation: Interaction...

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

2021-03-14CVPR 2021 1Semi-Supervised Video Object SegmentationSemantic SegmentationVideo Object SegmentationInteractive Video Object SegmentationVideo Semantic Segmentation
PaperPDFCodeCode(official)CodeCodeCode

Abstract

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-$k$ filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory. We evaluate our method both qualitatively and quantitatively with different forms of user interactions (e.g., scribbles, clicks) on DAVIS to show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions, with the additional advantage in generalizing to different types of user interactions. We contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation of 4.8M frames to accompany our source codes to facilitate future research.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Decay)8.2MiVOS
VideoDAVIS 2017 (val)F-measure (Mean)87.4MiVOS
VideoDAVIS 2017 (val)F-measure (Recall)93.1MiVOS
VideoDAVIS 2017 (val)J&F84.5MiVOS
VideoDAVIS 2017 (val)Jaccard (Decay)7MiVOS
VideoDAVIS 2017 (val)Jaccard (Mean)81.7MiVOS
VideoDAVIS 2017 (val)Jaccard (Recall)90.9MiVOS
VideoDAVIS 2017 (val)Speed (FPS)11.2MiVOS
VideoDAVIS 2016F-measure (Decay)5.1MiVOS
VideoDAVIS 2016F-measure (Mean)92.4MiVOS
VideoDAVIS 2016F-measure (Recall)96.4MiVOS
VideoDAVIS 2016J&F91MiVOS
VideoDAVIS 2016Jaccard (Decay)6.6MiVOS
VideoDAVIS 2016Jaccard (Mean)89.7MiVOS
VideoDAVIS 2016Jaccard (Recall)97.5MiVOS
VideoDAVIS 2016Speed (FPS)16.9MiVOS
VideoDAVIS 2017 (test-dev)F-measure (Decay)14.5MiVOS
VideoDAVIS 2017 (test-dev)F-measure (Mean)80.2MiVOS
VideoDAVIS 2017 (test-dev)F-measure (Recall)87.6MiVOS
VideoDAVIS 2017 (test-dev)J&F76.5MiVOS
VideoDAVIS 2017 (test-dev)Jaccard (Decay)14.9MiVOS
VideoDAVIS 2017 (test-dev)Jaccard (Mean)72.7MiVOS
VideoDAVIS 2017 (test-dev)Jaccard (Recall)81.2MiVOS
VideoYouTube-VOS 2018F-Measure (Seen)84.7MiVOS
VideoYouTube-VOS 2018F-Measure (Unseen)85.5MiVOS
VideoYouTube-VOS 2018Jaccard (Seen)80.6MiVOS
VideoYouTube-VOS 2018Jaccard (Unseen)77.3MiVOS
VideoYouTube-VOS 2018Overall82MiVOS
VideoDAVIS 2017AUC-J0.849MiVOS
VideoDAVIS 2017AUC-J&F0.879MiVOS
VideoDAVIS 2017J&F@60s0.885MiVOS
VideoDAVIS 2017J@60s0.854MiVOS
Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)8.2MiVOS
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.4MiVOS
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)93.1MiVOS
Video Object SegmentationDAVIS 2017 (val)J&F84.5MiVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)7MiVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)81.7MiVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)90.9MiVOS
Video Object SegmentationDAVIS 2017 (val)Speed (FPS)11.2MiVOS
Video Object SegmentationDAVIS 2016F-measure (Decay)5.1MiVOS
Video Object SegmentationDAVIS 2016F-measure (Mean)92.4MiVOS
Video Object SegmentationDAVIS 2016F-measure (Recall)96.4MiVOS
Video Object SegmentationDAVIS 2016J&F91MiVOS
Video Object SegmentationDAVIS 2016Jaccard (Decay)6.6MiVOS
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.7MiVOS
Video Object SegmentationDAVIS 2016Jaccard (Recall)97.5MiVOS
Video Object SegmentationDAVIS 2016Speed (FPS)16.9MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)14.5MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)80.2MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)87.6MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)J&F76.5MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)14.9MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)72.7MiVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)81.2MiVOS
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)84.7MiVOS
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)85.5MiVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)80.6MiVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)77.3MiVOS
Video Object SegmentationYouTube-VOS 2018Overall82MiVOS
Video Object SegmentationDAVIS 2017AUC-J0.849MiVOS
Video Object SegmentationDAVIS 2017AUC-J&F0.879MiVOS
Video Object SegmentationDAVIS 2017J&F@60s0.885MiVOS
Video Object SegmentationDAVIS 2017J@60s0.854MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)8.2MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.4MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)93.1MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F84.5MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)7MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)81.7MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)90.9MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Speed (FPS)11.2MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Decay)5.1MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)92.4MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Recall)96.4MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016J&F91MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Decay)6.6MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)89.7MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Recall)97.5MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Speed (FPS)16.9MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)14.5MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)80.2MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)87.6MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)J&F76.5MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)14.9MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)72.7MiVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)81.2MiVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)84.7MiVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)85.5MiVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)80.6MiVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)77.3MiVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Overall82MiVOS

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15