TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning What to Learn for Video Object Segmentation

Learning What to Learn for Video Object Segmentation

Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc van Gool, Radu Timofte

2020-03-25ECCV 2020 8Few-Shot LearningSemi-Supervised Video Object SegmentationOne-shot visual object segmentationSegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentation
PaperPDFCode(official)Code

Abstract

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result.

Results

TaskDatasetMetricValueModel
VideoDAVIS (no YouTube-VOS training)D17 val (F)76.3LWL
VideoDAVIS (no YouTube-VOS training)D17 val (G)74.3LWL
VideoDAVIS (no YouTube-VOS training)D17 val (J)72.2LWL
VideoDAVIS (no YouTube-VOS training)FPS14LWL
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)76.3LWL
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)74.3LWL
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)72.2LWL
Video Object SegmentationDAVIS (no YouTube-VOS training)FPS14LWL
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)76.3LWL
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)74.3LWL
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)72.2LWL
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)FPS14LWL

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17