MAST: A Memory-Augmented Self-supervised Tracker

Zihang Lai, Erika Lu, Weidi Xie

2020-02-18CVPR 2020 6Unsupervised Video Object Segmentation Semi-Supervised Video Object Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation

Paper PDF Code(official)Code

Abstract

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Mean)	67.6	MAST
Video	DAVIS 2017 (val)	F-measure (Recall)	77.7	MAST
Video	DAVIS 2017 (val)	J&F	65.5	MAST
Video	DAVIS 2017 (val)	Jaccard (Mean)	63.3	MAST
Video	DAVIS 2017 (val)	Jaccard (Recall)	73.2	MAST
Video	DAVIS 2017 (val)	F-measure (Mean)	67.6	MAST
Video	DAVIS 2017 (val)	F-measure (Recall)	77.7	MAST
Video	DAVIS 2017 (val)	J&F	65.5	MAST
Video	DAVIS 2017 (val)	Jaccard (Mean)	63.3	MAST
Video	DAVIS 2017 (val)	Jaccard (Recall)	73.2	MAST
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	67.6	MAST
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	77.7	MAST
Video Object Segmentation	DAVIS 2017 (val)	J&F	65.5	MAST
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	63.3	MAST
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	73.2	MAST
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	67.6	MAST
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	77.7	MAST
Video Object Segmentation	DAVIS 2017 (val)	J&F	65.5	MAST
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	63.3	MAST
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	73.2	MAST
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	67.6	MAST
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Recall)	77.7	MAST
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	65.5	MAST
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	63.3	MAST
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Recall)	73.2	MAST

MAST: A Memory-Augmented Self-supervised Tracker

Abstract

Results

Related Papers

MAST: A Memory-Augmented Self-supervised Tracker

Abstract

Results

Related Papers