Hierarchical Memory Matching Network for Video Object Segmentation

Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

2021-09-23ICCV 2021 10Semi-Supervised Video Object Segmentation Semantic Segmentation Video Object Segmentation Video Semantic Segmentation Retrieval

Paper PDF Code(official)

Abstract

We present Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation. Based on a recent memory-based method [33], we propose two advanced memory read modules that enable us to perform memory reading in multiple scales while exploiting temporal smoothness. We first propose a kernel guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods. The module imposes the temporal smoothness constraint in the memory read, leading to accurate memory retrieval. More importantly, we introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale. With the module, we perform memory read in multiple scales efficiently and leverage both high-level semantic and low-level fine-grained memory features to predict detailed object masks. Our network achieves state-of-the-art performance on the validation sets of DAVIS 2016/2017 (90.8% and 84.7%) and YouTube-VOS 2018/2019 (82.6% and 82.5%), and test-dev set of DAVIS 2017 (78.6%). The source code and model are available online: https://github.com/Hongje/HMMN.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017 (val)	F-measure (Mean)	87.5	HMMN
Video	DAVIS 2017 (val)	J&F	84.7	HMMN
Video	DAVIS 2017 (val)	Jaccard (Mean)	81.9	HMMN
Video	DAVIS 2016	F-measure (Mean)	92	HMMN
Video	DAVIS 2016	J&F	90.8	HMMN
Video	DAVIS 2016	Jaccard (Mean)	89.6	HMMN
Video	DAVIS 2017 (test-dev)	F-measure (Mean)	82.5	HMMN
Video	DAVIS 2017 (test-dev)	J&F	78.6	HMMN
Video	DAVIS 2017 (test-dev)	Jaccard (Mean)	74.7	HMMN
Video	DAVIS (no YouTube-VOS training)	D16 val (F)	90.6	HMMN
Video	DAVIS (no YouTube-VOS training)	D16 val (G)	89.4	HMMN
Video	DAVIS (no YouTube-VOS training)	D16 val (J)	88.2	HMMN
Video	DAVIS (no YouTube-VOS training)	D17 val (F)	83.1	HMMN
Video	DAVIS (no YouTube-VOS training)	D17 val (G)	80.4	HMMN
Video	DAVIS (no YouTube-VOS training)	D17 val (J)	77.7	HMMN
Video	DAVIS (no YouTube-VOS training)	FPS	10	HMMN
Video	YouTube-VOS 2018	F-Measure (Seen)	87	HMMN
Video	YouTube-VOS 2018	F-Measure (Unseen)	84.6	HMMN
Video	YouTube-VOS 2018	Jaccard (Seen)	82.1	HMMN
Video	YouTube-VOS 2018	Jaccard (Unseen)	76.8	HMMN
Video	YouTube-VOS 2018	Overall	82.6	HMMN
Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	87.5	HMMN
Video Object Segmentation	DAVIS 2017 (val)	J&F	84.7	HMMN
Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	81.9	HMMN
Video Object Segmentation	DAVIS 2016	F-measure (Mean)	92	HMMN
Video Object Segmentation	DAVIS 2016	J&F	90.8	HMMN
Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	89.6	HMMN
Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	82.5	HMMN
Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	78.6	HMMN
Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	74.7	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (F)	90.6	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (G)	89.4	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (J)	88.2	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	83.1	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	80.4	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	77.7	HMMN
Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	10	HMMN
Video Object Segmentation	YouTube-VOS 2018	F-Measure (Seen)	87	HMMN
Video Object Segmentation	YouTube-VOS 2018	F-Measure (Unseen)	84.6	HMMN
Video Object Segmentation	YouTube-VOS 2018	Jaccard (Seen)	82.1	HMMN
Video Object Segmentation	YouTube-VOS 2018	Jaccard (Unseen)	76.8	HMMN
Video Object Segmentation	YouTube-VOS 2018	Overall	82.6	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	F-measure (Mean)	87.5	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	J&F	84.7	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Jaccard (Mean)	81.9	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2016	F-measure (Mean)	92	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2016	J&F	90.8	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2016	Jaccard (Mean)	89.6	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	F-measure (Mean)	82.5	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	J&F	78.6	HMMN
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Jaccard (Mean)	74.7	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (F)	90.6	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (G)	89.4	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D16 val (J)	88.2	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (F)	83.1	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (G)	80.4	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	D17 val (J)	77.7	HMMN
Semi-Supervised Video Object Segmentation	DAVIS (no YouTube-VOS training)	FPS	10	HMMN
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	F-Measure (Seen)	87	HMMN
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	F-Measure (Unseen)	84.6	HMMN
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Jaccard (Seen)	82.1	HMMN
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Jaccard (Unseen)	76.8	HMMN
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Overall	82.6	HMMN

Hierarchical Memory Matching Network for Video Object Segmentation

Abstract

Results

Related Papers

Hierarchical Memory Matching Network for Video Object Segmentation

Abstract

Results

Related Papers