TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchical Memory Matching Network for Video Object Segm...

Hierarchical Memory Matching Network for Video Object Segmentation

Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

2021-09-23ICCV 2021 10Semi-Supervised Video Object SegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic SegmentationRetrieval
PaperPDFCode(official)

Abstract

We present Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation. Based on a recent memory-based method [33], we propose two advanced memory read modules that enable us to perform memory reading in multiple scales while exploiting temporal smoothness. We first propose a kernel guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods. The module imposes the temporal smoothness constraint in the memory read, leading to accurate memory retrieval. More importantly, we introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale. With the module, we perform memory read in multiple scales efficiently and leverage both high-level semantic and low-level fine-grained memory features to predict detailed object masks. Our network achieves state-of-the-art performance on the validation sets of DAVIS 2016/2017 (90.8% and 84.7%) and YouTube-VOS 2018/2019 (82.6% and 82.5%), and test-dev set of DAVIS 2017 (78.6%). The source code and model are available online: https://github.com/Hongje/HMMN.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Mean)87.5HMMN
VideoDAVIS 2017 (val)J&F84.7HMMN
VideoDAVIS 2017 (val)Jaccard (Mean)81.9HMMN
VideoDAVIS 2016F-measure (Mean)92HMMN
VideoDAVIS 2016J&F90.8HMMN
VideoDAVIS 2016Jaccard (Mean)89.6HMMN
VideoDAVIS 2017 (test-dev)F-measure (Mean)82.5HMMN
VideoDAVIS 2017 (test-dev)J&F78.6HMMN
VideoDAVIS 2017 (test-dev)Jaccard (Mean)74.7HMMN
VideoDAVIS (no YouTube-VOS training)D16 val (F)90.6HMMN
VideoDAVIS (no YouTube-VOS training)D16 val (G)89.4HMMN
VideoDAVIS (no YouTube-VOS training)D16 val (J)88.2HMMN
VideoDAVIS (no YouTube-VOS training)D17 val (F)83.1HMMN
VideoDAVIS (no YouTube-VOS training)D17 val (G)80.4HMMN
VideoDAVIS (no YouTube-VOS training)D17 val (J)77.7HMMN
VideoDAVIS (no YouTube-VOS training)FPS10HMMN
VideoYouTube-VOS 2018F-Measure (Seen)87HMMN
VideoYouTube-VOS 2018F-Measure (Unseen)84.6HMMN
VideoYouTube-VOS 2018Jaccard (Seen)82.1HMMN
VideoYouTube-VOS 2018Jaccard (Unseen)76.8HMMN
VideoYouTube-VOS 2018Overall82.6HMMN
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.5HMMN
Video Object SegmentationDAVIS 2017 (val)J&F84.7HMMN
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)81.9HMMN
Video Object SegmentationDAVIS 2016F-measure (Mean)92HMMN
Video Object SegmentationDAVIS 2016J&F90.8HMMN
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.6HMMN
Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)82.5HMMN
Video Object SegmentationDAVIS 2017 (test-dev)J&F78.6HMMN
Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)74.7HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)90.6HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)89.4HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)88.2HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)83.1HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)80.4HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)77.7HMMN
Video Object SegmentationDAVIS (no YouTube-VOS training)FPS10HMMN
Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)87HMMN
Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)84.6HMMN
Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)82.1HMMN
Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)76.8HMMN
Video Object SegmentationYouTube-VOS 2018Overall82.6HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.5HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F84.7HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)81.9HMMN
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)92HMMN
Semi-Supervised Video Object SegmentationDAVIS 2016J&F90.8HMMN
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)89.6HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)F-measure (Mean)82.5HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)J&F78.6HMMN
Semi-Supervised Video Object SegmentationDAVIS 2017 (test-dev)Jaccard (Mean)74.7HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)90.6HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)89.4HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)88.2HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)83.1HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)80.4HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)77.7HMMN
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)FPS10HMMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Seen)87HMMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018F-Measure (Unseen)84.6HMMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Seen)82.1HMMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Jaccard (Unseen)76.8HMMN
Semi-Supervised Video Object SegmentationYouTube-VOS 2018Overall82.6HMMN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17