TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SipMask: Spatial Information Preservation for Fast Image a...

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao

2020-07-29ECCV 2020 8Real-time Instance SegmentationSegmentationSemantic SegmentationInstance SegmentationVideo Instance Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.

Results

TaskDatasetMetricValueModel
Instance SegmentationCOCO test-devAP5060.2SipMask (ResNet-101, single-scale test)
Instance SegmentationCOCO test-devAP7540.8SipMask (ResNet-101, single-scale test)
Instance SegmentationCOCO test-devAPL54.3SipMask (ResNet-101, single-scale test)
Instance SegmentationCOCO test-devAPM40.8SipMask (ResNet-101, single-scale test)
Instance SegmentationCOCO test-devAPS17.8SipMask (ResNet-101, single-scale test)
Instance SegmentationCOCO test-devmask AP38.1SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAP5055.6SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAP7537.6SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPL56.8SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPM38.3SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPS11.2SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOmask AP35.4SipMask++ (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAP5053.4SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAP7534.3SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPL54SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPM35.6SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAPS9.3SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOmask AP32.8SipMask (ResNet-101, single-scale test)
Instance SegmentationMSCOCOAP5051.9SipMask (ResNet-50, single-scale test)
Instance SegmentationMSCOCOAP7532.3SipMask (ResNet-50, single-scale test)
Instance SegmentationMSCOCOAPL49.8SipMask (ResNet-50, single-scale test)
Instance SegmentationMSCOCOAPM33.6SipMask (ResNet-50, single-scale test)
Instance SegmentationMSCOCOAPS9.2SipMask (ResNet-50, single-scale test)
Instance SegmentationMSCOCOmask AP31.2SipMask (ResNet-50, single-scale test)
Video Instance SegmentationYouTube-VIS validationAP5054.1SipMask (ResNet-50, ms-train, single-scale test)
Video Instance SegmentationYouTube-VIS validationAP7535.8SipMask (ResNet-50, ms-train, single-scale test)
Video Instance SegmentationYouTube-VIS validationAR135.4SipMask (ResNet-50, ms-train, single-scale test)
Video Instance SegmentationYouTube-VIS validationAR1040.1SipMask (ResNet-50, ms-train, single-scale test)
Video Instance SegmentationYouTube-VIS validationmask AP33.7SipMask (ResNet-50, ms-train, single-scale test)
Video Instance SegmentationYouTube-VIS validationAP5053SipMask (ResNet-50, single-scale test)
Video Instance SegmentationYouTube-VIS validationAP7533.3SipMask (ResNet-50, single-scale test)
Video Instance SegmentationYouTube-VIS validationAR133.5SipMask (ResNet-50, single-scale test)
Video Instance SegmentationYouTube-VIS validationAR1038.9SipMask (ResNet-50, single-scale test)
Video Instance SegmentationYouTube-VIS validationmask AP32.5SipMask (ResNet-50, single-scale test)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17