MSN: Efficient Online Mask Selection Network for Video Instance Segmentation

Vidit Goel, Jiachen Li, Shubhika Garg, Harsh Maheshwari, Humphrey Shi

2021-06-19Segmentation Semantic Segmentation Video Object Segmentation Instance Segmentation Video Semantic Segmentation Video Instance Segmentation

Paper PDF Code(official)

Abstract

In this work we present a novel solution for Video Instance Segmentation(VIS), that is automatically generating instance level segmentation masks along with object class and tracking them in a video. Our method improves the masks from segmentation and propagation branches in an online manner using the Mask Selection Network (MSN) hence limiting the noise accumulation during mask tracking. We propose an effective design of MSN by using patch-based convolutional neural network. The network is able to distinguish between very subtle differences between the masks and choose the better masks out of the associated masks accurately. Further, we make use of temporal consistency and process the video sequences in both forward and reverse manner as a post processing step to recover lost objects. The proposed method can be used to adapt any video object segmentation method for the task of VIS. Our method achieves a score of 49.1 mAP on 2021 YouTube-VIS Challenge and was ranked third place among more than 30 global teams. Our code will be available at https://github.com/SHI-Labs/Mask-Selection-Networks.

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS validation	AP50	69.4	MSN
Video Instance Segmentation	YouTube-VIS validation	AP75	54.9	MSN
Video Instance Segmentation	YouTube-VIS validation	AR1	40.1	MSN
Video Instance Segmentation	YouTube-VIS validation	AR10	55	MSN
Video Instance Segmentation	YouTube-VIS validation	mask AP	48.8	MSN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17