TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SWEM: Towards Real-Time Video Object Segmentation with Seq...

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Zhihui Lin, Tianyu Yang, Maomao Li, Ziyu Wang, Chun Yuan, Wenhao Jiang, Wei Liu

2022-08-22CVPR 2022 1Semi-Supervised Video Object SegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic Segmentation
PaperPDFCode(official)

Abstract

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS). However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. Different from the previous methods which only detect feature redundancy between frames, SWEM merges both intra-frame and inter-frame similar features by leveraging the sequential weighted EM algorithm. Further, adaptive weights for frame features endow SWEM with the flexibility to represent hard samples, improving the discrimination of templates. Besides, the proposed method maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system. Extensive experiments on commonly used DAVIS and YouTube-VOS datasets verify the high efficiency (36 FPS) and high performance (84.3\% $\mathcal{J}\&\mathcal{F}$ on DAVIS 2017 validation dataset) of SWEM. Code is available at: https://github.com/lmm077/SWEM.

Results

TaskDatasetMetricValueModel
VideoMOSEF54.9SWEM
VideoMOSEJ46.8SWEM
VideoMOSEJ&F50.9SWEM
VideoDAVIS 2017 (val)F-measure (Mean)79.8SWEM
VideoDAVIS 2017 (val)J&F77.2SWEM
VideoDAVIS 2017 (val)Jaccard (Mean)74.5SWEM
VideoDAVIS 2016F-measure (Mean)89SWEM (val)
VideoDAVIS 2016J&F88.1SWEM (val)
VideoDAVIS 2016Jaccard (Mean)87.3SWEM (val)
VideoDAVIS 2016Speed (FPS)36SWEM (val)
VideoDAVIS (no YouTube-VOS training)D16 val (F)89SWEM
VideoDAVIS (no YouTube-VOS training)D16 val (G)88.1SWEM
VideoDAVIS (no YouTube-VOS training)D16 val (J)87.3SWEM
VideoDAVIS (no YouTube-VOS training)D17 val (F)79.8SWEM
VideoDAVIS (no YouTube-VOS training)D17 val (G)77.2SWEM
VideoDAVIS (no YouTube-VOS training)D17 val (J)74.5SWEM
VideoDAVIS (no YouTube-VOS training)FPS36SWEM
Video Object SegmentationMOSEF54.9SWEM
Video Object SegmentationMOSEJ46.8SWEM
Video Object SegmentationMOSEJ&F50.9SWEM
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)79.8SWEM
Video Object SegmentationDAVIS 2017 (val)J&F77.2SWEM
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)74.5SWEM
Video Object SegmentationDAVIS 2016F-measure (Mean)89SWEM (val)
Video Object SegmentationDAVIS 2016J&F88.1SWEM (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)87.3SWEM (val)
Video Object SegmentationDAVIS 2016Speed (FPS)36SWEM (val)
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)89SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)88.1SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)87.3SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)79.8SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)77.2SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)74.5SWEM
Video Object SegmentationDAVIS (no YouTube-VOS training)FPS36SWEM
Semi-Supervised Video Object SegmentationMOSEF54.9SWEM
Semi-Supervised Video Object SegmentationMOSEJ46.8SWEM
Semi-Supervised Video Object SegmentationMOSEJ&F50.9SWEM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)79.8SWEM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F77.2SWEM
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)74.5SWEM
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)89SWEM (val)
Semi-Supervised Video Object SegmentationDAVIS 2016J&F88.1SWEM (val)
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)87.3SWEM (val)
Semi-Supervised Video Object SegmentationDAVIS 2016Speed (FPS)36SWEM (val)
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (F)89SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (G)88.1SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D16 val (J)87.3SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (F)79.8SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (G)77.2SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)D17 val (J)74.5SWEM
Semi-Supervised Video Object SegmentationDAVIS (no YouTube-VOS training)FPS36SWEM

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15