TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Prototypical Cross-Attention Networks for Multiple Object ...

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

2021-06-22NeurIPS 2021 12Multiple Object Track and SegmentationMulti-Object Tracking and SegmentationSegmentationObject TrackingMultiple Object TrackingVideo Instance Segmentation
PaperPDFCode(official)

Abstract

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http://vis.xyz/pub/pcan.

Results

TaskDatasetMetricValueModel
VideoBDD100K valmMOTSA27.4PCAN
Object TrackingBDD100K valmMOTSA27.4PCAN
Video Instance SegmentationYouTube-VIS validationAP5054.9PCAN(ResNet-50)
Video Instance SegmentationYouTube-VIS validationAP7539.4PCAN(ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR136.3PCAN(ResNet-50)
Video Instance SegmentationYouTube-VIS validationAR1041.6PCAN(ResNet-50)
Video Instance SegmentationYouTube-VIS validationmask AP36.1PCAN(ResNet-50)
Video Instance SegmentationBDD100K valmMOTSA27.4PCAN
Video Instance SegmentationBDD100K valmMOTSA23.5QDTrack-mots-fix
Video Instance SegmentationBDD100K valmMOTSA22.5QDTrack-mots
Video Instance SegmentationBDD100K valmMOTSA12.3MaskTrackRCNN
Video Instance SegmentationBDD100K valmMOTSA12.2STEm-Seg
Video Instance SegmentationBDD100K valmMOTSA10.3SortIoU
Multi-Object Tracking and SegmentationBDD100K valmMOTSA27.4PCAN
Multi-Object Tracking and SegmentationBDD100K valmMOTSA23.5QDTrack-mots-fix
Multi-Object Tracking and SegmentationBDD100K valmMOTSA22.5QDTrack-mots
Multi-Object Tracking and SegmentationBDD100K valmMOTSA12.3MaskTrackRCNN
Multi-Object Tracking and SegmentationBDD100K valmMOTSA12.2STEm-Seg
Multi-Object Tracking and SegmentationBDD100K valmMOTSA10.3SortIoU
Multiple Object TrackingBDD100K valmMOTSA27.4PCAN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17