Learning What to Learn for Video Object Segmentation

Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc van Gool, Radu Timofte

2020-03-25ECCV 2020 8Few-Shot Learning Semi-Supervised Video Object Segmentation One-shot visual object segmentation Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation

Paper PDF Code(official)Code

Abstract

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS (no YouTube-VOS training)	D17 val (F)	76.3	LWL
Video	DAVIS (no YouTube-VOS training)	D17 val (G)	74.3	LWL
Video	DAVIS (no YouTube-VOS training)	D17 val (J)	72.2	LWL
Video	DAVIS (no YouTube-VOS training)	FPS	14	LWL
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	76.3	LWL
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	74.3	LWL
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	72.2	LWL
Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	14	LWL
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	76.3	LWL
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	74.3	LWL
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	72.2	LWL
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	14	LWL

Learning What to Learn for Video Object Segmentation

Abstract

Results

Related Papers

Learning What to Learn for Video Object Segmentation

Abstract

Results

Related Papers