TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Make One-Shot Video Object Segmentation Efficient Again

Make One-Shot Video Object Segmentation Efficient Again

Tim Meinhardt, Laura Leal-Taixe

2020-12-03NeurIPS 2020 12Semi-Supervised Video Object SegmentationOne-shot visual object segmentationSegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentationobject-detectionObject Detection
PaperPDFCodeCode(official)CodeCode

Abstract

Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible. To mitigate the inefficiencies of previous fine-tuning approaches, we present efficient One-Shot Video Object Segmentation (e-OSVOS). In contrast to most VOS approaches, e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN. The one-shot test runtime and performance are optimized without a laborious and handcrafted hyperparameter search. To this end, we meta learn the model initialization and learning rates for the test time optimization. To achieve optimal learning behavior, we predict individual learning rates at a neuron level. Furthermore, we apply an online adaptation to address the common performance degradation throughout a sequence by continuously fine-tuning the model on previous mask predictions supported by a frame-to-frame bounding box propagation. e-OSVOS provides state-of-the-art results on DAVIS 2016, DAVIS 2017, and YouTube-VOS for one-shot fine-tuning methods while reducing the test runtime substantially. Code is available at https://github.com/dvl-tum/e-osvos.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Mean)80e-OSVOS
VideoDAVIS 2017 (val)J&F77.2e-OSVOS
VideoDAVIS 2017 (val)Jaccard (Decay)13e-OSVOS
VideoDAVIS 2017 (val)Jaccard (Mean)74.4e-OSVOS
VideoDAVIS 2016F-measure (Mean)87e-OSVOS
VideoDAVIS 2016J&F86.8e-OSVOS
VideoDAVIS 2016Jaccard (Decay)4.5e-OSVOS
VideoDAVIS 2016Jaccard (Mean)86.6e-OSVOS
VideoDAVIS 2017 (test-dev)F-measure (Mean)68.6e-OSVOS
VideoDAVIS 2017 (test-dev)J&F64.8e-OSVOS
VideoDAVIS 2017 (test-dev)Jaccard (Decay)22.1e-OSVOS
VideoDAVIS 2017 (test-dev)Jaccard (Mean)60.9e-OSVOS
VideoYouTube-VOS 2018F-Measure (Seen)66e-OSVOS
VideoYouTube-VOS 2018F-Measure (Unseen)73.8e-OSVOS
VideoYouTube-VOS 2018Jaccard (Seen)71.7e-OSVOS
VideoYouTube-VOS 2018Jaccard (Unseen)74.3e-OSVOS
VideoYouTube-VOS 2018Overall71.4e-OSVOS
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)80e-OSVOS
Video Object SegmentationDAVIS 2017 (val)J&F77.2e-OSVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)13e-OSVOS
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)74.4e-OSVOS
Video Object SegmentationDAVIS 2016F-measure (Mean)87e-OSVOS
Video Object SegmentationDAVIS 2016J&F86.8e-OSVOS
Video Object SegmentationDAVIS 2016Jaccard (Decay)4.5e-OSVOS
Video Object SegmentationDAVIS 2016Jaccard (Mean)86.6e-OSVOS
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)68.6e-OSVOS
Video Object SegmentationDAVIS 2017 (test-dev)J&F64.8e-OSVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)22.1e-OSVOS
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)60.9e-OSVOS
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)66e-OSVOS
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)73.8e-OSVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)71.7e-OSVOS
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)74.3e-OSVOS
Video Object SegmentationYouTube-VOS 2018Overall71.4e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)80e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F77.2e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Decay)13e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)74.4e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)87e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2016J&F86.8e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Decay)4.5e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)86.6e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)68.6e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)J&F64.8e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Decay)22.1e-OSVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)60.9e-OSVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)66e-OSVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)73.8e-OSVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)71.7e-OSVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)74.3e-OSVOS
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Overall71.4e-OSVOS

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17