Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

2021-06-22NeurIPS 2021 12Multiple Object Track and Segmentation Multi-Object Tracking and Segmentation Segmentation Object Tracking Multiple Object Tracking Video Instance Segmentation

Paper PDF Code(official)

Abstract

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http://vis.xyz/pub/pcan.

Results

Task	Dataset	Metric	Value	Model
Video	BDD100K val	mMOTSA	27.4	PCAN
Object Tracking	BDD100K val	mMOTSA	27.4	PCAN
Video Instance Segmentation	YouTube-VIS validation	AP50	54.9	PCAN(ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AP75	39.4	PCAN(ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR1	36.3	PCAN(ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR10	41.6	PCAN(ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	mask AP	36.1	PCAN(ResNet-50)
Video Instance Segmentation	BDD100K val	mMOTSA	27.4	PCAN
Video Instance Segmentation	BDD100K val	mMOTSA	23.5	QDTrack-mots-fix
Video Instance Segmentation	BDD100K val	mMOTSA	22.5	QDTrack-mots
Video Instance Segmentation	BDD100K val	mMOTSA	12.3	MaskTrackRCNN
Video Instance Segmentation	BDD100K val	mMOTSA	12.2	STEm-Seg
Video Instance Segmentation	BDD100K val	mMOTSA	10.3	SortIoU
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	27.4	PCAN
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	23.5	QDTrack-mots-fix
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	22.5	QDTrack-mots
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	12.3	MaskTrackRCNN
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	12.2	STEm-Seg
Multi-Object Tracking and Segmentation	BDD100K val	mMOTSA	10.3	SortIoU
Multiple Object Tracking	BDD100K val	mMOTSA	27.4	PCAN

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Abstract

Results

Related Papers

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Abstract

Results

Related Papers