TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Spatial Feature Calibration and Temporal Fusion for Effect...

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Minghan Li, Shuai Li, Lida Li, Lei Zhang

2021-04-06CVPR 2021 1SegmentationSemantic SegmentationInstance SegmentationVideo Instance Segmentation
PaperPDFCode(official)

Abstract

Modern one-stage video instance segmentation networks suffer from two limitations. First, convolutional features are neither aligned with anchor boxes nor with ground-truth bounding boxes, reducing the mask sensitivity to spatial location. Second, a video is directly divided into individual frames for frame-level instance segmentation, ignoring the temporal correlation between adjacent frames. To address these issues, we propose a simple yet effective one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask. To ensure spatial feature calibration with ground-truth bounding boxes, we first predict regressed bounding boxes around ground-truth bounding boxes, and extract features from them for frame-level instance segmentation. To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our framework to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses. Experiments on the YouTube-VIS valid set show that the proposed STMask with ResNet-50/-101 backbone obtains 33.5 % / 36.8 % mask AP, while achieving 28.6 / 23.4 FPS on video instance segmentation. The code is released online https://github.com/MinghanLi/STMask.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS 2021AP5054STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS 2021AP7538STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS 2021AR129.4STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS 2021AR1039.1STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS 2021mask AP34.6STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS validationAP5056.8STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS validationAP7538STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS validationAR134.8STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS validationAR1041.8STMask(R101-DCN-FPN)
Video Instance SegmentationYouTube-VIS validationmask AP36.8STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAP5035.4STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAP7515.2STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAPho23.7STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAPmo14.7STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAPso11.1STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAR18.4STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationAR1023.1STMask(R101-DCN-FPN)
Video Instance SegmentationOVIS validationmask AP17.3STMask(R101-DCN-FPN)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17