STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

Zhengkai Jiang, Zhangxuan Gu, Jinlong Peng, Hang Zhou, Liang Liu, Yabiao Wang, Ying Tai, Chengjie Wang, Liqing Zhang

2022-02-08Segmentation Semantic Segmentation Contrastive Learning Instance Segmentation Video Instance Segmentation

Abstract

Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video. Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related operations or 3D convolutions. In contrast, we present a simple and efficient single-stage VIS framework based on the instance segmentation method CondInst by adding an extra tracking head. To improve instance association accuracy, a novel bi-directional spatio-temporal contrastive learning strategy for tracking embedding across frames is proposed. Moreover, an instance-wise temporal consistency scheme is utilized to produce temporally coherent results. Experiments conducted on the YouTube-VIS-2019, YouTube-VIS-2021, and OVIS-2021 datasets validate the effectiveness and efficiency of the proposed method. We hope the proposed framework can serve as a simple and strong alternative for many other instance-level video association tasks.

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS validation	AP50	57.2	STC (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AP75	38.6	STC (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR1	36.9	STC (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR10	44.5	STC (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	mask AP	36.7	STC (ResNet-50)
Video Instance Segmentation	OVIS validation	AP50	33.5	STC (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	13.4	STC (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	15.5	STC (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17