TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Simple Video Segmenter by Tracking Objects Along Axial T...

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Alan Yuille, Xiaohui Shen, Liang-Chieh Chen

2023-11-30Panoptic SegmentationVideo Panoptic SegmentationSegmentationVideo SegmentationObject TrackingVideo Semantic SegmentationVideo Classification
PaperPDFCode(official)Code(official)

Abstract

Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges, often leading to insufficient GPU memory capacity. Consequently, modern video segmenters either extend an image segmenter without incorporating any temporal attention or resort to window space-time attention in a naive manner. In this work, we present Axial-VS, a general and simple framework that enhances video segmenters by tracking objects along axial trajectories. The framework tackles video segmentation through two sub-tasks: short-term within-clip segmentation and long-term cross-clip tracking. In the first step, Axial-VS augments an off-the-shelf clip-level video segmenter with the proposed axial-trajectory attention, sequentially tracking objects along the height- and width-trajectories within a clip, thereby enhancing temporal consistency by capturing motion trajectories. The axial decomposition significantly reduces the computational complexity for dense features, and outperforms the window space-time attention in segmentation quality. In the second step, we further employ axial-trajectory attention to the object queries in clip-level segmenters, which are learned to encode object information, thereby aiding object tracking across different clips and achieving consistent segmentation throughout the video. Without bells and whistles, Axial-VS showcases state-of-the-art results on video segmentation benchmarks, emphasizing its effectiveness in addressing the limitations of modern clip-level video segmenters. Code and models are available at https://github.com/TACJu/Axial-VS.

Results

TaskDatasetMetricValueModel
Semantic SegmentationVIPSegVPQ58MaXTron (ConvNeXtV2-L)
Semantic SegmentationVIPSegVPQ57.1MaXTron (ConvNeXt-L)
Semantic SegmentationVIPSegVPQ46.7MaXTron (ResNet-50)
10-shot image generationVIPSegVPQ58MaXTron (ConvNeXtV2-L)
10-shot image generationVIPSegVPQ57.1MaXTron (ConvNeXt-L)
10-shot image generationVIPSegVPQ46.7MaXTron (ResNet-50)
Panoptic SegmentationVIPSegVPQ58MaXTron (ConvNeXtV2-L)
Panoptic SegmentationVIPSegVPQ57.1MaXTron (ConvNeXt-L)
Panoptic SegmentationVIPSegVPQ46.7MaXTron (ResNet-50)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17