TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Generalized Framework for Video Instance Segmentation

A Generalized Framework for Video Instance Segmentation

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

2022-11-16CVPR 2023 1Semantic SegmentationInstance SegmentationVideo Instance Segmentation
PaperPDFCode(official)

Abstract

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS 2021AP5080.9GenVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AP7566.5GenVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR149.1GenVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR1064.7GenVIS (Swin-L)
Video Instance SegmentationYouTube-VIS 2021mask AP60.1GenVIS (Swin-L)
Video Instance SegmentationOVIS validationAP5069.2GenVIS (Swin-L)
Video Instance SegmentationOVIS validationAP7547.8GenVIS (Swin-L)
Video Instance SegmentationOVIS validationAR118.9GenVIS (Swin-L)
Video Instance SegmentationOVIS validationAR1049GenVIS (Swin-L)
Video Instance SegmentationOVIS validationmask AP45.4GenVIS (Swin-L)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15