Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Long Lian, Zhirong Wu, Stella X. Yu

2023-04-17CVPR 2023 1Unsupervised Video Object Segmentation Optical Flow Estimation Motion Segmentation Segmentation Semantic Segmentation Object Discovery Video Object Segmentation Video Semantic Segmentation Unsupervised Object Segmentation

Paper PDF Code(official)

Abstract

We study learning object segmentation from unlabeled videos. Humans can easily segment moving objects without knowing what they are. The Gestalt law of common fate, i.e., what move at the same speed belong together, has inspired unsupervised object discovery based on motion segmentation. However, common fate is not a reliable indicator of objectness: Parts of an articulated / deformable object may not move at the same speed, whereas shadows / reflections of an object always move with it but are not part of it. Our insight is to bootstrap objectness by first learning image features from relaxed common fate and then refining them based on visual appearance grouping within the image itself and across images statistically. Specifically, we learn an image segmenter first in the loop of approximating optical flow with constant segment flow plus small within-segment residual flow, and then by refining it for more coherent appearance and statistical figure-ground relevance. On unsupervised video object segmentation, using only ResNet and convolutional heads, our model surpasses the state-of-the-art by absolute gains of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively, demonstrating the effectiveness of our ideas. Our code is publicly available.

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	SegTrack-v2	mIoU	79.6	RCF (with post-processing)
Instance Segmentation	SegTrack-v2	mIoU	76.7	RCF (without post-processing)
Instance Segmentation	FBMS-59	mIoU	72.4	RCF (with post-processing)
Instance Segmentation	FBMS-59	mIoU	69.9	RCF (without post-processing)
Instance Segmentation	DAVIS 2016	J score	83	RCF (with Post-Processing)
Instance Segmentation	DAVIS 2016	J score	80.9	RCF (without Post-Processing)
Unsupervised Object Segmentation	SegTrack-v2	mIoU	79.6	RCF (with post-processing)
Unsupervised Object Segmentation	SegTrack-v2	mIoU	76.7	RCF (without post-processing)
Unsupervised Object Segmentation	FBMS-59	mIoU	72.4	RCF (with post-processing)
Unsupervised Object Segmentation	FBMS-59	mIoU	69.9	RCF (without post-processing)
Unsupervised Object Segmentation	DAVIS 2016	J score	83	RCF (with Post-Processing)
Unsupervised Object Segmentation	DAVIS 2016	J score	80.9	RCF (without Post-Processing)

Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Abstract

Results

Related Papers

Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Abstract

Results

Related Papers