TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NOVIS: A Case for End-to-End Near-Online Video Instance Se...

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

2023-08-29SegmentationSemantic SegmentationInstance SegmentationVideo Instance Segmentation
PaperPDF

Abstract

Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this belief, in particular, for challenging and long video sequences. We understand this work as a rebuttal of those recent observations and an appeal to the community to focus on dedicated near-online VIS approaches. To support our argument, we present a detailed analysis on different processing paradigms and the new end-to-end trainable NOVIS (Near-Online Video Instance Segmentation) method. Our transformer-based model directly predicts spatio-temporal mask volumes for clips of frames and performs instance tracking between clips via overlap embeddings. NOVIS represents the first near-online VIS approach which avoids any handcrafted tracking heuristics. We outperform all existing VIS methods by large margins and provide new state-of-the-art results on both YouTube-VIS (2019/2021) and the OVIS benchmarks.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS 2021AP5082NOVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AP7566.5NOVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR147.9NOVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR1064.4NOVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021mask AP59.8NOVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AP5069.4NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS 2021AP7550NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS 2021AR141.3NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS 2021AR1054.4NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS 2021mask AP47.2NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP5075.7NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7556.9NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR150.3NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR1060.6NOVIS (ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP52.8NOVIS (ResNet-50)
Video Instance SegmentationOVIS validationAP5068.3NOVIS (Swin-L)
Video Instance SegmentationOVIS validationAP7543.8NOVIS (Swin-L)
Video Instance SegmentationOVIS validationAR119.4NOVIS (Swin-L)
Video Instance SegmentationOVIS validationAR1046.9NOVIS (Swin-L)
Video Instance SegmentationOVIS validationmask AP43.5NOVIS (Swin-L)
Video Instance SegmentationOVIS validationAP5056.2NOVIS (ResNet-50)
Video Instance SegmentationOVIS validationAP7532.6NOVIS (ResNet-50)
Video Instance SegmentationOVIS validationAR115.7NOVIS (ResNet-50)
Video Instance SegmentationOVIS validationAR1037.1NOVIS (ResNet-50)
Video Instance SegmentationOVIS validationmask AP32.7NOVIS (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17