A Generative Appearance Model for End-to-end Video Object Segmentation

Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, Michael Felsberg

2018-11-28CVPR 2019 6Semi-Supervised Video Object Segmentation One-shot visual object segmentation Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation

Paper PDF Code

Abstract

One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Decay)	15.8	AGAME
Video	DAVIS 2017 (val)	F-measure (Mean)	73.6	AGAME
Video	DAVIS 2017 (val)	F-measure (Recall)	83.4	AGAME
Video	DAVIS 2017 (val)	J&F	71.05	AGAME
Video	DAVIS 2017 (val)	Jaccard (Decay)	14	AGAME
Video	DAVIS 2017 (val)	Jaccard (Mean)	68.5	AGAME
Video	DAVIS 2017 (val)	Jaccard (Recall)	78.4	AGAME
Video	DAVIS 2016	F-measure (Decay)	9.8	AGAME
Video	DAVIS 2016	F-measure (Mean)	82.2	AGAME
Video	DAVIS 2016	F-measure (Recall)	90.3	AGAME
Video	DAVIS 2016	J&F	81.85	AGAME
Video	DAVIS 2016	Jaccard (Decay)	9.4	AGAME
Video	DAVIS 2016	Jaccard (Mean)	81.5	AGAME
Video	DAVIS 2016	Jaccard (Recall)	93.6	AGAME
Video	DAVIS 2017 (test-dev)	F-measure (Decay)	27.6	AGAME
Video	DAVIS 2017 (test-dev)	F-measure (Mean)	55.3	AGAME
Video	DAVIS 2017 (test-dev)	F-measure (Recall)	61.1	AGAME
Video	DAVIS 2017 (test-dev)	J&F	52.3	AGAME
Video	DAVIS 2017 (test-dev)	Jaccard (Decay)	28.9	AGAME
Video	DAVIS 2017 (test-dev)	Jaccard (Mean)	49.2	AGAME
Video	DAVIS 2017 (test-dev)	Jaccard (Recall)	53.2	AGAME
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Decay)	15.8	AGAME
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	73.6	AGAME
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	83.4	AGAME
Video Object Segmentation	DAVIS 2017 (val)	J&F	71.05	AGAME
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Decay)	14	AGAME
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	68.5	AGAME
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	78.4	AGAME
Video Object Segmentation	DAVIS 2016	F-measure (Decay)	9.8	AGAME
Video Object Segmentation	DAVIS 2016	F-measure (Mean)	82.2	AGAME
Video Object Segmentation	DAVIS 2016	F-measure (Recall)	90.3	AGAME
Video Object Segmentation	DAVIS 2016	J&F	81.85	AGAME
Video Object Segmentation	DAVIS 2016	Jaccard (Decay)	9.4	AGAME
Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	81.5	AGAME
Video Object Segmentation	DAVIS 2016	Jaccard (Recall)	93.6	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Decay)	27.6	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	55.3	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Recall)	61.1	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	52.3	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Decay)	28.9	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	49.2	AGAME
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Recall)	53.2	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Decay)	15.8	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	73.6	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	83.4	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	71.05	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Decay)	14	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	68.5	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	78.4	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Decay)	9.8	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Mean)	82.2	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Recall)	90.3	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	J&F	81.85	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Decay)	9.4	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	81.5	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Recall)	93.6	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Decay)	27.6	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	55.3	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Recall)	61.1	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	52.3	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Decay)	28.9	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	49.2	AGAME
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Recall)	53.2	AGAME

A Generative Appearance Model for End-to-end Video Object Segmentation

Abstract

Results

Related Papers

A Generative Appearance Model for End-to-end Video Object Segmentation

Abstract

Results

Related Papers