CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

Linchao Bao, Baoyuan Wu, Wei Liu

2018-03-26CVPR 2018 6One-Shot Segmentation Semi-Supervised Video Object Segmentation Optical Flow Estimation Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation

Paper PDF

Abstract

This paper addresses the problem of video object segmentation, where the initial object mask is given in the first frame of an input video. We propose a novel spatio-temporal Markov Random Field (MRF) model defined over pixels to handle this problem. Unlike conventional MRF models, the spatial dependencies among pixels in our model are encoded by a Convolutional Neural Network (CNN). Specifically, for a given object, the probability of a labeling to a set of spatially neighboring pixels can be predicted by a CNN trained for this specific object. As a result, higher-order, richer dependencies among pixels in the set can be implicitly modeled by the CNN. With temporal dependencies established by optical flow, the resulting MRF model combines both spatial and temporal cues for tackling video object segmentation. However, performing inference in the MRF model is very difficult due to the very high-order dependencies. To this end, we propose a novel CNN-embedded algorithm to perform approximate inference in the MRF. This algorithm proceeds by alternating between a temporal fusion step and a feed-forward CNN step. When initialized with an appearance-based one-shot segmentation CNN, our model outperforms the winning entries of the DAVIS 2017 Challenge, without resorting to model ensembling or any dedicated detectors.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Decay)	26.2	CINM
Video	DAVIS 2017 (val)	F-measure (Mean)	74	CINM
Video	DAVIS 2017 (val)	F-measure (Recall)	81.6	CINM
Video	DAVIS 2017 (val)	J&F	70.6	CINM
Video	DAVIS 2017 (val)	Jaccard (Decay)	24.6	CINM
Video	DAVIS 2017 (val)	Jaccard (Mean)	67.2	CINM
Video	DAVIS 2017 (val)	Jaccard (Recall)	74.5	CINM
Video	DAVIS 2016	F-measure (Decay)	14.7	CINM
Video	DAVIS 2016	F-measure (Mean)	85	CINM
Video	DAVIS 2016	F-measure (Recall)	92.1	CINM
Video	DAVIS 2016	J&F	84.2	CINM
Video	DAVIS 2016	Jaccard (Decay)	12.3	CINM
Video	DAVIS 2016	Jaccard (Mean)	83.4	CINM
Video	DAVIS 2016	Jaccard (Recall)	94.9	CINM
Video	YouTube	mIoU	0.784	MRFCNN
Video	DAVIS 2017 (test-dev)	F-measure (Decay)	20	CINM
Video	DAVIS 2017 (test-dev)	F-measure (Mean)	70.5	CINM
Video	DAVIS 2017 (test-dev)	F-measure (Recall)	79.6	CINM
Video	DAVIS 2017 (test-dev)	J&F	67.5	CINM
Video	DAVIS 2017 (test-dev)	Jaccard (Decay)	20	CINM
Video	DAVIS 2017 (test-dev)	Jaccard (Mean)	64.5	CINM
Video	DAVIS 2017 (test-dev)	Jaccard (Recall)	73.8	CINM
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Decay)	26.2	CINM
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	74	CINM
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	81.6	CINM
Video Object Segmentation	DAVIS 2017 (val)	J&F	70.6	CINM
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Decay)	24.6	CINM
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	67.2	CINM
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	74.5	CINM
Video Object Segmentation	DAVIS 2016	F-measure (Decay)	14.7	CINM
Video Object Segmentation	DAVIS 2016	F-measure (Mean)	85	CINM
Video Object Segmentation	DAVIS 2016	F-measure (Recall)	92.1	CINM
Video Object Segmentation	DAVIS 2016	J&F	84.2	CINM
Video Object Segmentation	DAVIS 2016	Jaccard (Decay)	12.3	CINM
Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	83.4	CINM
Video Object Segmentation	DAVIS 2016	Jaccard (Recall)	94.9	CINM
Video Object Segmentation	YouTube	mIoU	0.784	MRFCNN
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Decay)	20	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	70.5	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Recall)	79.6	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	67.5	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Decay)	20	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	64.5	CINM
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Recall)	73.8	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Decay)	26.2	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	74	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	81.6	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	70.6	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Decay)	24.6	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	67.2	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	74.5	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Decay)	14.7	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Mean)	85	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Recall)	92.1	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	J&F	84.2	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Decay)	12.3	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	83.4	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Recall)	94.9	CINM
Semi-Supervised Video Object Segmentation	YouTube	mIoU	0.784	MRFCNN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Decay)	20	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	70.5	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Recall)	79.6	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	67.5	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Decay)	20	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	64.5	CINM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Recall)	73.8	CINM

CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

Abstract

Results

Related Papers

CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

Abstract

Results

Related Papers