TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Crossover Learning for Fast Online Video Instance Segmenta...

Crossover Learning for Fast Online Video Instance Segmentation

Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

2021-04-13ICCV 2021 10Semantic SegmentationInstance SegmentationVideo UnderstandingVideo Instance Segmentation
PaperPDFCode(official)

Abstract

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames. Different from previous schemes, crossover learning does not require any additional network parameters for feature enhancement. By integrating with the instance segmentation loss, crossover learning enables efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference. Besides, a global balanced instance embedding branch is proposed for more accurate and more stable online instance association. We conduct extensive experiments on three challenging VIS benchmarks, \ie, YouTube-VIS-2019, OVIS, and YouTube-VIS-2021 to evaluate our methods. To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy. Code will be available to facilitate future research.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS validationAP5057.3CrossVIS (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAP7539.7CrossVIS (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAR136CrossVIS (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAR1042CrossVIS (ResNet-101)
Video Instance SegmentationYouTube-VIS validationmask AP36.6CrossVIS (ResNet-101)
Video Instance SegmentationOVIS validationAP5035.5CrossVIS (ResNet-50, calibration)
Video Instance SegmentationOVIS validationAP7516.9CrossVIS (ResNet-50, calibration)
Video Instance SegmentationOVIS validationmask AP18.1CrossVIS (ResNet-50, calibration)
Video Instance SegmentationOVIS validationAP5032.7CrossVIS (ResNet-50)
Video Instance SegmentationOVIS validationAP7512.1CrossVIS (ResNet-50)
Video Instance SegmentationOVIS validationmask AP14.9CrossVIS (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15