A Transductive Approach for Video Object Segmentation

Yizhuo Zhang, Zhirong Wu, Houwen Peng, Stephen Lin

2020-04-15CVPR 2020 6Semi-Supervised Video Object Segmentation Optical Flow Estimation Semantic Segmentation Video Object Segmentation Instance Segmentation Video Semantic Segmentation

Paper PDF Code(official)

Abstract

Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame. Most of current prevailing methods utilize information from additional modules trained in other domains like optical flow and instance segmentation, and as a result they do not compete with other methods on common ground. To address this issue, we propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed. Our method takes a label propagation approach where pixel labels are passed forward based on feature similarity in an embedding space. Different from other propagation methods, ours diffuses temporal information in a holistic manner which take accounts of long-term object appearance. In addition, our method requires few additional computational overhead, and runs at a fast $\sim$37 fps speed. Our single model with a vanilla ResNet50 backbone achieves an overall score of 72.3 on the DAVIS 2017 validation set and 63.1 on the test set. This simple yet high performing and efficient method can serve as a solid baseline that facilitates future research. Code and models are available at \url{https://github.com/microsoft/transductive-vos.pytorch}.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Mean)	74.7	TVOS
Video	DAVIS 2017 (val)	J&F	72.3	TVOS
Video	DAVIS 2017 (val)	Jaccard (Mean)	69.9	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 test (F)	67.4	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 test (G)	63.1	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 test (J)	58.8	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 val (F)	74.7	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 val (G)	72.3	TVOS
Video	DAVIS (no YouTube-VOS training)	D17 val (J)	69.9	TVOS
Video	DAVIS (no YouTube-VOS training)	FPS	37	TVOS
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	74.7	TVOS
Video Object Segmentation	DAVIS 2017 (val)	J&F	72.3	TVOS
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	69.9	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (F)	67.4	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (G)	63.1	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (J)	58.8	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	74.7	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	72.3	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	69.9	TVOS
Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	37	TVOS
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	74.7	TVOS
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	72.3	TVOS
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	69.9	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (F)	67.4	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (G)	63.1	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 test (J)	58.8	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	74.7	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	72.3	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	69.9	TVOS
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	37	TVOS

A Transductive Approach for Video Object Segmentation

Abstract

Results

Related Papers

A Transductive Approach for Video Object Segmentation

Abstract

Results

Related Papers