TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Instance Segmentation using Inter-Frame Communicatio...

Video Instance Segmentation using Inter-Frame Communication Transformers

Sukjun Hwang, Miran Heo, Seoung Wug Oh, Seon Joo Kim

2021-06-07NeurIPS 2021 12Instance SegmentationVideo Instance Segmentation
PaperPDFCode(official)

Abstract

We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames. However, previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications, limiting practicality. In this work, we propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip. Specifically, we propose to utilize concise memory tokens as a mean of conveying information as well as summarizing each frame scene. The features of each frame are enriched and correlated with other frames through exchange of information between the precisely encoded memory tokens. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (AP 44.6 on YouTube-VIS 2019 val set using the offline inference) while having a considerably fast runtime (89.4 FPS). Our method can also be applied to near-online inference for processing a video in real-time with only a small delay. The code will be made available.

Results

TaskDatasetMetricValueModel
Video Instance SegmentationYouTube-VIS validationAP5065.8IFC (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7546.8IFC (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR143.8IFC (ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR1051.2IFC (ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP42.8IFC (ResNet-50)

Related Papers

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation2025-07-08SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning2025-07-08DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation2025-07-08No time to train! Training-Free Reference-Based Instance Segmentation2025-07-03NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation2025-07-02VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding2025-06-28