TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/STEm-Seg: Spatio-temporal Embeddings for Instance Segmenta...

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Ali Athar, Sabarinath Mahadevan, Aljoša Ošep, Laura Leal-Taixé, Bastian Leibe

2020-03-18ECCV 2020 8Unsupervised Video Object SegmentationSemantic SegmentationInstance SegmentationVideo Instance Segmentation
PaperPDFCode(official)

Abstract

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in individual frames, and then associate these detections over time. Hence, these methods are often non-end-to-end trainable and highly tailored to specific tasks. In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos. In particular, we model a video clip as a single 3D spatio-temporal volume, and propose a novel approach that segments and tracks instances across space and time in a single stage. Our problem formulation is centered around the idea of spatio-temporal embeddings which are trained to cluster pixels belonging to a specific object instance over an entire video clip. To this end, we introduce (i) novel mixing functions that enhance the feature representation of spatio-temporal embeddings, and (ii) a single-stage, proposal-free network that can reason about temporal context. Our network is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster these embeddings, thus simplifying inference. Our method achieves state-of-the-art results across multiple datasets and tasks. Code and models are available at https://github.com/sabarim/STEm-Seg.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Mean)67.8STEm-Seg
VideoDAVIS 2017 (val)F-measure (Recall)75.5STEm-Seg
VideoDAVIS 2017 (val)J&F64.7STEm-Seg
VideoDAVIS 2017 (val)Jaccard (Mean)61.5STEm-Seg
VideoDAVIS 2017 (val)Jaccard (Recall)70.4STEm-Seg
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)67.8STEm-Seg
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)75.5STEm-Seg
Video Object SegmentationDAVIS 2017 (val)J&F64.7STEm-Seg
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)61.5STEm-Seg
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)70.4STEm-Seg
Video Instance SegmentationYouTube-VIS validationAP5055.8STEm-Seg (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAP7537.9STEm-Seg (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAR134.4STEm-Seg (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAR1041.6STEm-Seg (ResNet-101)
Video Instance SegmentationYouTube-VIS validationmask AP34.6STEm-Seg (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAP5050.7STEm-Seg (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7537.9STEm-Seg (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR134.4STEm-Seg (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR1041.6STEm-Seg (ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP30.6STEm-Seg (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15