TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mask2Former for Video Instance Segmentation

Mask2Former for Video Instance Segmentation

Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

2021-12-20Panoptic SegmentationSegmentationSemantic SegmentationVideo SegmentationInstance SegmentationVideo Semantic SegmentationVideo Instance SegmentationImage Segmentation
PaperPDFCodeCodeCodeCodeCode(official)Code

Abstract

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS validationAP5084.4Mask2Former (Swin-L)
Video Instance SegmentationYouTube-VIS validationAP7567Mask2Former (Swin-L)
Video Instance SegmentationYouTube-VIS validationmask AP60.4Mask2Former (Swin-L)
Video Instance SegmentationYouTube-VIS validationAP5072.8Mask2Former (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAP7554.2Mask2Former (ResNet-101)
Video Instance SegmentationYouTube-VIS validationmask AP49.2Mask2Former (ResNet-101)
Video Instance SegmentationYouTube-VIS validationAP5068Mask2Former (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7550Mask2Former (ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP46.4Mask2Former (ResNet-50)
Video Instance SegmentationOVIS validationAP5036.9Mask2Former-VIS
Video Instance SegmentationOVIS validationAP7514.1Mask2Former-VIS
Video Instance SegmentationOVIS validationAR19.9Mask2Former-VIS
Video Instance SegmentationOVIS validationAR1024.7Mask2Former-VIS
Video Instance SegmentationOVIS validationmask AP16.6Mask2Former-VIS

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17