TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Tube-Link: A Flexible Cross Tube Framework for Universal V...

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy

2023-03-22ICCV 2023 1Video Panoptic SegmentationSegmentationVideo SegmentationContrastive LearningVideo Semantic SegmentationVideo Instance Segmentation
PaperPDFCode(official)

Abstract

Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks. To enhance the modeling of cross-tube relationships, we propose an effective way to perform tube-level linking via attention along the queries. In addition, we introduce temporal contrastive learning to instance-wise discriminative features for tube-level association. Our approach offers flexibility and efficiency for both short and long video inputs, as the length of each subclip can be varied according to the needs of datasets or scenarios. Tube-Link outperforms existing specialized architectures by a significant margin on five video segmentation datasets. Specifically, it achieves almost 13% relative improvements on VIPSeg and 4% improvements on KITTI-STEP over the strong baseline Video K-Net. When using a ResNet50 backbone on Youtube-VIS-2019 and 2021, Tube-Link boosts IDOL by 3% and 4%, respectively.

Results

TaskDatasetMetricValueModel
Scene ParsingVSPWmIoU59.6Tube-Link(Swin-large)
Semantic SegmentationVIPSegSTQ49.4Tube-Link(Swin-base)
Semantic SegmentationVIPSegVPQ50.4Tube-Link(Swin-base)
Semantic SegmentationKITTI-STEPAQ69Tube-Link(Swin-base)
Semantic SegmentationKITTI-STEPSQ74Tube-Link(Swin-base)
Semantic SegmentationKITTI-STEPSTQ72Tube-Link(Swin-base)
Video Semantic SegmentationVSPWmIoU59.6Tube-Link(Swin-large)
Scene UnderstandingVSPWmIoU59.6Tube-Link(Swin-large)
Video Instance SegmentationYouTube-VIS 2021AP5079.4Tube-Link(Swin-L)
Video Instance SegmentationYouTube-VIS 2021AP7564.3Tube-Link(Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR147.5Tube-Link(Swin-L)
Video Instance SegmentationYouTube-VIS 2021AR1063.6Tube-Link(Swin-L)
Video Instance SegmentationYouTube-VIS 2021mask AP58.4Tube-Link(Swin-L)
Video Instance SegmentationYouTube-VIS validationAP5086.6Tube-Link
Video Instance SegmentationYouTube-VIS validationAP7571.3Tube-Link
Video Instance SegmentationYouTube-VIS validationAR155.9Tube-Link
Video Instance SegmentationYouTube-VIS validationAR1069.1Tube-Link
Video Instance SegmentationYouTube-VIS validationmask AP64.6Tube-Link
Video Instance SegmentationOVIS validationAP5051.5Tube-Link(ResNet-50)
Video Instance SegmentationOVIS validationAP7530.2Tube-Link(ResNet-50)
Video Instance SegmentationOVIS validationAR115.5Tube-Link(ResNet-50)
Video Instance SegmentationOVIS validationAR1034.5Tube-Link(ResNet-50)
Video Instance SegmentationOVIS validationmask AP29.5Tube-Link(ResNet-50)
2D Semantic SegmentationVSPWmIoU59.6Tube-Link(Swin-large)
10-shot image generationVIPSegSTQ49.4Tube-Link(Swin-base)
10-shot image generationVIPSegVPQ50.4Tube-Link(Swin-base)
10-shot image generationKITTI-STEPAQ69Tube-Link(Swin-base)
10-shot image generationKITTI-STEPSQ74Tube-Link(Swin-base)
10-shot image generationKITTI-STEPSTQ72Tube-Link(Swin-base)
Panoptic SegmentationVIPSegSTQ49.4Tube-Link(Swin-base)
Panoptic SegmentationVIPSegVPQ50.4Tube-Link(Swin-base)
Panoptic SegmentationKITTI-STEPAQ69Tube-Link(Swin-base)
Panoptic SegmentationKITTI-STEPSQ74Tube-Link(Swin-base)
Panoptic SegmentationKITTI-STEPSTQ72Tube-Link(Swin-base)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17