TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DMM-Net: Differentiable Mask-Matching Network for Video Ob...

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun

2019-09-27ICCV 2019 10Semi-Supervised Video Object SegmentationOne-shot visual object segmentationSemantic SegmentationVideo Object SegmentationVideo Semantic SegmentationRolling Shutter Correction
PaperPDFCode(official)

Abstract

In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals at one time step as a linear assignment problem where the cost matrix is predicted by a CNN. We propose a differentiable matching layer by unrolling a projected gradient descent algorithm in which the projection exploits the Dykstra's algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimum. In practice, it performs similarly to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a refinement head is leveraged to improve the quality of the matched mask. Our DMM-Net achieves competitive results on the largest video object segmentation dataset YouTube-VOS. On DAVIS 2017, DMM-Net achieves the best performance without online learning on the first frames. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our matching layer is very simple to implement; we attach the PyTorch code ($<50$ lines) in the supplementary material. Our code is released at https://github.com/ZENGXH/DMM_Net.

Results

TaskDatasetMetricValueModel
VideoDAVIS (no YouTube-VOS training)D17 val (F)73.3DMM-Net
VideoDAVIS (no YouTube-VOS training)D17 val (G)70.7DMM-Net
VideoDAVIS (no YouTube-VOS training)D17 val (J)68.1DMM-Net
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)73.3DMM-Net
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)70.7DMM-Net
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)68.1DMM-Net
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)73.3DMM-Net
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)70.7DMM-Net
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)68.1DMM-Net

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15