SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Zhihui Lin, Tianyu Yang, Maomao Li, Ziyu Wang, Chun Yuan, Wenhao Jiang, Wei Liu

2022-08-22CVPR 2022 1Semi-Supervised Video Object Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation

Paper PDF Code(official)

Abstract

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS). However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. Different from the previous methods which only detect feature redundancy between frames, SWEM merges both intra-frame and inter-frame similar features by leveraging the sequential weighted EM algorithm. Further, adaptive weights for frame features endow SWEM with the flexibility to represent hard samples, improving the discrimination of templates. Besides, the proposed method maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system. Extensive experiments on commonly used DAVIS and YouTube-VOS datasets verify the high efficiency (36 FPS) and high performance (84.3\% $\mathcal{J}\&\mathcal{F}$ on DAVIS 2017 validation dataset) of SWEM. Code is available at: https://github.com/lmm077/SWEM.

Results

Task	Dataset	Metric	Value	Model
Video	MOSE	F	54.9	SWEM
Video	MOSE	J	46.8	SWEM
Video	MOSE	J&F	50.9	SWEM
Video	DAVIS 2017 (val)	F-measure (Mean)	79.8	SWEM
Video	DAVIS 2017 (val)	J&F	77.2	SWEM
Video	DAVIS 2017 (val)	Jaccard (Mean)	74.5	SWEM
Video	DAVIS 2016	F-measure (Mean)	89	SWEM (val)
Video	DAVIS 2016	J&F	88.1	SWEM (val)
Video	DAVIS 2016	Jaccard (Mean)	87.3	SWEM (val)
Video	DAVIS 2016	Speed (FPS)	36	SWEM (val)
Video	DAVIS (no YouTube-VOS training)	D16 val (F)	89	SWEM
Video	DAVIS (no YouTube-VOS training)	D16 val (G)	88.1	SWEM
Video	DAVIS (no YouTube-VOS training)	D16 val (J)	87.3	SWEM
Video	DAVIS (no YouTube-VOS training)	D17 val (F)	79.8	SWEM
Video	DAVIS (no YouTube-VOS training)	D17 val (G)	77.2	SWEM
Video	DAVIS (no YouTube-VOS training)	D17 val (J)	74.5	SWEM
Video	DAVIS (no YouTube-VOS training)	FPS	36	SWEM
Video Object Segmentation	MOSE	F	54.9	SWEM
Video Object Segmentation	MOSE	J	46.8	SWEM
Video Object Segmentation	MOSE	J&F	50.9	SWEM
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	79.8	SWEM
Video Object Segmentation	DAVIS 2017 (val)	J&F	77.2	SWEM
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	74.5	SWEM
Video Object Segmentation	DAVIS 2016	F-measure (Mean)	89	SWEM (val)
Video Object Segmentation	DAVIS 2016	J&F	88.1	SWEM (val)
Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	87.3	SWEM (val)
Video Object Segmentation	DAVIS 2016	Speed (FPS)	36	SWEM (val)
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (F)	89	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (G)	88.1	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (J)	87.3	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	79.8	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	77.2	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	74.5	SWEM
Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	36	SWEM
Semi-Supervised Video Object Segmentation	MOSE	F	54.9	SWEM
Semi-Supervised Video Object Segmentation	MOSE	J	46.8	SWEM
Semi-Supervised Video Object Segmentation	MOSE	J&F	50.9	SWEM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	79.8	SWEM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	77.2	SWEM
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	74.5	SWEM
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Mean)	89	SWEM (val)
Semi-Supervised Video Object Segmentation	DAVIS 2016	J&F	88.1	SWEM (val)
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	87.3	SWEM (val)
Semi-Supervised Video Object Segmentation	DAVIS 2016	Speed (FPS)	36	SWEM (val)
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (F)	89	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (G)	88.1	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (J)	87.3	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	79.8	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	77.2	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	74.5	SWEM
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	36	SWEM

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Abstract

Results

Related Papers

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Abstract

Results

Related Papers