Joint-task Self-supervised Learning for Temporal Correspondence

Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

2019-09-26NeurIPS 2019 12Unsupervised Video Object Segmentation Semi-Supervised Video Object Segmentation Self-Supervised Learning Object Tracking

Paper PDF Code(official)Code

Abstract

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Mean)	61.3	UVC
Video	DAVIS 2017 (val)	F-measure (Recall)	69.8	UVC
Video	DAVIS 2017 (val)	J&F	59.5	UVC
Video	DAVIS 2017 (val)	Jaccard (Mean)	57.7	UVC
Video	DAVIS 2017 (val)	Jaccard (Recall)	68.3	UVC
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	61.3	UVC
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	69.8	UVC
Video Object Segmentation	DAVIS 2017 (val)	J&F	59.5	UVC
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	57.7	UVC
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	68.3	UVC
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	61.3	UVC
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	69.8	UVC
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	59.5	UVC
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	57.7	UVC
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	68.3	UVC

Joint-task Self-supervised Learning for Temporal Correspondence

Abstract

Results

Related Papers

Joint-task Self-supervised Learning for Temporal Correspondence

Abstract

Results

Related Papers