TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Object Segmentation using Space-Time Memory Networks

Video Object Segmentation using Space-Time Memory Networks

Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim

2019-04-01ICCV 2019 10Semi-Supervised Video Object SegmentationOne-shot visual object segmentationSemantic SegmentationVideo Object SegmentationInteractive Video Object SegmentationVideo Semantic Segmentation
PaperPDFCodeCodeCode

Abstract

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure84.3STM
VideoDAVIS 2017 (val)Jaccard79.2STM
VideoDAVIS 2017 (val)F-measure (Decay)10.5STM
VideoDAVIS 2017 (val)F-measure (Mean)84.3STM
VideoDAVIS 2017 (val)F-measure (Recall)91.8STM
VideoDAVIS 2017 (val)J&F81.75STM
VideoDAVIS 2017 (val)Jaccard (Decay)8STM
VideoDAVIS 2017 (val)Jaccard (Mean)79.2STM
VideoDAVIS 2017 (val)Jaccard (Recall)88.7STM
VideoDAVIS 2016F-measure (Decay)4.2STM
VideoDAVIS 2016F-measure (Mean)90.1STM
VideoDAVIS 2016F-measure (Recall)95.2STM
VideoDAVIS 2016J&F89.4STM
VideoDAVIS 2016Jaccard (Decay)5STM
VideoDAVIS 2016Jaccard (Mean)88.7STM
VideoDAVIS 2016Jaccard (Recall)97.4STM
VideoDAVIS 2017 (test-dev)F-measure (Decay)17.5STM
VideoDAVIS 2017 (test-dev)F-measure (Mean)75.2STM
VideoDAVIS 2017 (test-dev)F-measure (Recall)83STM
VideoDAVIS 2017 (test-dev)J&F72.2STM
VideoDAVIS 2017 (test-dev)Jaccard (Decay)16.9STM
VideoDAVIS 2017 (test-dev)Jaccard (Mean)69.3STM
VideoDAVIS 2017 (test-dev)Jaccard (Recall)78STM
VideoDAVIS (no YouTube-VOS training)D16 val (F)88.1STM
VideoDAVIS (no YouTube-VOS training)D16 val (G)86.5STM
VideoDAVIS (no YouTube-VOS training)D16 val (J)84.8STM
VideoDAVIS (no YouTube-VOS training)D17 val (F)74STM
VideoDAVIS (no YouTube-VOS training)D17 val (G)71.6STM
VideoDAVIS (no YouTube-VOS training)D17 val (J)69.2STM
VideoDAVIS (no YouTube-VOS training)FPS6.25STM
VideoYouTube-VOS 2018Overall68.2STM
VideoDAVIS 2017AUC-J&F0.803STM
VideoDAVIS 2017J&F@60s0.848STM
Video Object SegmentationDAVIS 2017 (val)F-measure84.3STM
Video Object SegmentationDAVIS 2017 (val)Jaccard79.2STM
Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)10.5STM
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)84.3STM
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)91.8STM
Video Object SegmentationDAVIS 2017 (val)J&F81.75STM
Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)8STM
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)79.2STM
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)88.7STM
Video Object SegmentationDAVIS 2016F-measure (Decay)4.2STM
Video Object SegmentationDAVIS 2016F-measure (Mean)90.1STM
Video Object SegmentationDAVIS 2016F-measure (Recall)95.2STM
Video Object SegmentationDAVIS 2016J&F89.4STM
Video Object SegmentationDAVIS 2016Jaccard (Decay)5STM
Video Object SegmentationDAVIS 2016Jaccard (Mean)88.7STM
Video Object SegmentationDAVIS 2016Jaccard (Recall)97.4STM
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)17.5STM
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)75.2STM
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)83STM
Video Object SegmentationDAVIS 2017 (test-dev)J&F72.2STM
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)16.9STM
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)69.3STM
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)78STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)88.1STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)86.5STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)84.8STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)74STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)71.6STM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)69.2STM
Video Object SegmentationDAVIS (no YouTube-VOS training)FPS6.25STM
Video Object SegmentationYouTube-VOS 2018Overall68.2STM
Video Object SegmentationDAVIS 2017AUC-J&F0.803STM
Video Object SegmentationDAVIS 2017J&F@60s0.848STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)10.5STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)84.3STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)91.8STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F81.75STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)8STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)79.2STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)88.7STM
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Decay)4.2STM
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)90.1STM
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Recall)95.2STM
Semi-Supervised Video Object SegmentationDAVIS 2016J&F89.4STM
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Decay)5STM
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)88.7STM
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Recall)97.4STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)17.5STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)75.2STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)83STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)J&F72.2STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)16.9STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)69.3STM
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)78STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)88.1STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)86.5STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)84.8STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)74STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)71.6STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)69.2STM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)FPS6.25STM
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Overall68.2STM

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15