TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking Space-Time Networks with Improved Memory Covera...

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

2021-06-09NeurIPS 2021 12Semi-Supervised Video Object SegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentation
PaperPDFCodeCodeCode(official)

Abstract

This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation. Unlike most existing approaches, we establish correspondences directly between frames without re-encoding the mask features for every object, leading to a highly efficient and robust framework. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We cast the aggregation process as a voting problem and find that the existing inner-product affinity leads to poor use of memory with a small (fixed) subset of memory nodes dominating the votes, regardless of the query. In light of this phenomenon, we propose using the negative squared Euclidean distance instead to compute the affinities. We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy. The synergy of correspondence networks and diversified voting works exceedingly well, achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.

Results

TaskDatasetMetricValueModel
VideoYouTube-VOS 2019F-Measure (Unseen)85.9STCN
VideoYouTube-VOS 2019Jaccard (Seen)81.1STCN
VideoYouTube-VOS 2019Jaccard (Unseen)78.2STCN
VideoYouTube-VOS 2019Mean Jaccard & F-Measure82.7STCN
VideoDAVIS 2017 (val)F-measure88.6STCN
VideoMOSEF55STCN
VideoMOSEJ46.6STCN
VideoMOSEJ&F50.8STCN
VideoDAVIS 2017 (val)F-measure (Decay)85.3STCN
VideoDAVIS 2017 (val)F-measure (Mean)88.6STCN
VideoDAVIS 2017 (val)F-measure (Recall)94.6STCN
VideoDAVIS 2017 (val)J&F85.3STCN
VideoDAVIS 2017 (val)Jaccard (Decay)6.2STCN
VideoDAVIS 2017 (val)Jaccard (Mean)82STCN
VideoDAVIS 2017 (val)Jaccard (Recall)91.3STCN
VideoDAVIS 2017 (val)Speed (FPS)20.2STCN
VideoDAVIS 2016F-measure (Decay)4.3STCN
VideoDAVIS 2016F-measure (Mean)93STCN
VideoDAVIS 2016F-measure (Recall)97.1STCN
VideoDAVIS 2016J&F91.7STCN
VideoDAVIS 2016Jaccard (Decay)4.1STCN
VideoDAVIS 2016Jaccard (Mean)90.4STCN
VideoDAVIS 2016Jaccard (Recall)98.1STCN
VideoDAVIS 2016Speed (FPS)26.9STCN
VideoYouTube-VOS 2019F-Measure (Seen)87.8STCN (MS)
VideoYouTube-VOS 2019F-Measure (Unseen)88.8STCN (MS)
VideoYouTube-VOS 2019Jaccard (Seen)83.5STCN (MS)
VideoYouTube-VOS 2019Jaccard (Unseen)80.8STCN (MS)
VideoYouTube-VOS 2019Overall85.2STCN (MS)
VideoYouTube-VOS 2019F-Measure (Seen)87STCN
VideoYouTube-VOS 2019F-Measure (Unseen)87.7STCN
VideoYouTube-VOS 2019Jaccard (Seen)82.6STCN
VideoYouTube-VOS 2019Jaccard (Unseen)79.4STCN
VideoYouTube-VOS 2019Overall84.2STCN
VideoDAVIS 2017 (test-dev)F-measure (Decay)10.3STCN
VideoDAVIS 2017 (test-dev)F-measure (Mean)83.5STCN
VideoDAVIS 2017 (test-dev)F-measure (Recall)89.7STCN
VideoDAVIS 2017 (test-dev)J&F79.9STCN
VideoDAVIS 2017 (test-dev)Jaccard (Decay)10.5STCN
VideoDAVIS 2017 (test-dev)Jaccard (Mean)76.3STCN
VideoDAVIS 2017 (test-dev)Jaccard (Recall)85.5STCN
VideoYouTube-VOS 2018F-Measure (Seen)87.9STCN
VideoYouTube-VOS 2018F-Measure (Unseen)87.3STCN
VideoYouTube-VOS 2018Jaccard (Seen)83.2STCN
VideoYouTube-VOS 2018Jaccard (Unseen)79STCN
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)85.9STCN
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)81.1STCN
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)78.2STCN
Video Object SegmentationYouTube-VOS 2019Mean Jaccard & F-Measure82.7STCN
Video Object SegmentationDAVIS 2017 (val)F-measure88.6STCN
Video Object SegmentationMOSEF55STCN
Video Object SegmentationMOSEJ46.6STCN
Video Object SegmentationMOSEJ&F50.8STCN
Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)85.3STCN
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)88.6STCN
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)94.6STCN
Video Object SegmentationDAVIS 2017 (val)J&F85.3STCN
Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)6.2STCN
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)82STCN
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)91.3STCN
Video Object SegmentationDAVIS 2017 (val)Speed (FPS)20.2STCN
Video Object SegmentationDAVIS 2016F-measure (Decay)4.3STCN
Video Object SegmentationDAVIS 2016F-measure (Mean)93STCN
Video Object SegmentationDAVIS 2016F-measure (Recall)97.1STCN
Video Object SegmentationDAVIS 2016J&F91.7STCN
Video Object SegmentationDAVIS 2016Jaccard (Decay)4.1STCN
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.4STCN
Video Object SegmentationDAVIS 2016Jaccard (Recall)98.1STCN
Video Object SegmentationDAVIS 2016Speed (FPS)26.9STCN
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)87.8STCN (MS)
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)88.8STCN (MS)
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)83.5STCN (MS)
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)80.8STCN (MS)
Video Object SegmentationYouTube-VOS 2019Overall85.2STCN (MS)
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)87STCN
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)87.7STCN
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)82.6STCN
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)79.4STCN
Video Object SegmentationYouTube-VOS 2019Overall84.2STCN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)10.3STCN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)83.5STCN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)89.7STCN
Video Object SegmentationDAVIS 2017 (test-dev)J&F79.9STCN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)10.5STCN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)76.3STCN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)85.5STCN
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)87.9STCN
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)87.3STCN
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)83.2STCN
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)79STCN
Semi-Supervised Video Object SegmentationMOSEF55STCN
Semi-Supervised Video Object SegmentationMOSEJ46.6STCN
Semi-Supervised Video Object SegmentationMOSEJ&F50.8STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Decay)85.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)88.6STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)94.6STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F85.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)6.2STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)82STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)91.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Speed (FPS)20.2STCN
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Decay)4.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)93STCN
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Recall)97.1STCN
Semi-Supervised Video Object SegmentationDAVIS 2016J&F91.7STCN
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Decay)4.1STCN
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)90.4STCN
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Recall)98.1STCN
Semi-Supervised Video Object SegmentationDAVIS 2016Speed (FPS)26.9STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)87.8STCN (MS)
Semi-Supervised Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)88.8STCN (MS)
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)83.5STCN (MS)
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)80.8STCN (MS)
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Overall85.2STCN (MS)
Semi-Supervised Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)87STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)87.7STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)82.6STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)79.4STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2019Overall84.2STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Decay)10.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)83.5STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Recall)89.7STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)J&F79.9STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)10.5STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)76.3STCN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Recall)85.5STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)87.9STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)87.3STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)83.2STCN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)79STCN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15