TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/In Defense of Online Models for Video Instance Segmentation

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

2022-07-21SegmentationSemantic SegmentationVideo Object SegmentationContrastive LearningInstance SegmentationVideo Semantic SegmentationVideo Instance Segmentation
PaperPDFCodeCode(official)

Abstract

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. Despite its simplicity, our method outperforms all online and offline methods on three benchmarks. Specifically, we achieve 49.5 AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on OVIS, a more challenging dataset with significant crowding and occlusions, surpassing the prior art by 14.8 AP. The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of our method, as well as our insight into current methods, could shed light on the exploration of VIS models.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS 2021AP5080.8IDOL (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AP7563.5IDOL (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR145IDOL (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR1060.1IDOL (Swin-L)
Video Instance SegmentationYouTube-VIS 2021mask AP56.1IDOL (Swin-L)
Video Instance SegmentationYouTube-VIS validationAP5074IDOL (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7552.9IDOL (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR147.7IDOL (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR1058.7IDOL (ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP49.5IDOL (ResNet-50)
Video Instance SegmentationOVIS validationAP5065.7IDOL (Swin-L)
Video Instance SegmentationOVIS validationAP7545.2IDOL (Swin-L)
Video Instance SegmentationOVIS validationAR117.9IDOL (Swin-L)
Video Instance SegmentationOVIS validationAR1049.6IDOL (Swin-L)
Video Instance SegmentationOVIS validationmask AP42.6IDOL (Swin-L)
Video Instance SegmentationOVIS validationAP5051.3IDOL (ResNet-50)
Video Instance SegmentationOVIS validationAP7530IDOL (ResNet-50)
Video Instance SegmentationOVIS validationAR115IDOL (ResNet-50)
Video Instance SegmentationOVIS validationAR1037.5IDOL (ResNet-50)
Video Instance SegmentationOVIS validationmask AP30.2IDOL (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17