Crossover Learning for Fast Online Video Instance Segmentation

Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

2021-04-13ICCV 2021 10Semantic Segmentation Instance Segmentation Video Understanding Video Instance Segmentation

Abstract

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames. Different from previous schemes, crossover learning does not require any additional network parameters for feature enhancement. By integrating with the instance segmentation loss, crossover learning enables efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference. Besides, a global balanced instance embedding branch is proposed for more accurate and more stable online instance association. We conduct extensive experiments on three challenging VIS benchmarks, \ie, YouTube-VIS-2019, OVIS, and YouTube-VIS-2021 to evaluate our methods. To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy. Code will be available to facilitate future research.

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS validation	AP50	57.3	CrossVIS (ResNet-101)
Video Instance Segmentation	YouTube-VIS validation	AP75	39.7	CrossVIS (ResNet-101)
Video Instance Segmentation	YouTube-VIS validation	AR1	36	CrossVIS (ResNet-101)
Video Instance Segmentation	YouTube-VIS validation	AR10	42	CrossVIS (ResNet-101)
Video Instance Segmentation	YouTube-VIS validation	mask AP	36.6	CrossVIS (ResNet-101)
Video Instance Segmentation	OVIS validation	AP50	35.5	CrossVIS (ResNet-50, calibration)
Video Instance Segmentation	OVIS validation	AP75	16.9	CrossVIS (ResNet-50, calibration)
Video Instance Segmentation	OVIS validation	mask AP	18.1	CrossVIS (ResNet-50, calibration)
Video Instance Segmentation	OVIS validation	AP50	32.7	CrossVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	12.1	CrossVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	14.9	CrossVIS (ResNet-50)

Crossover Learning for Fast Online Video Instance Segmentation

Abstract

Results

Related Papers

Crossover Learning for Fast Online Video Instance Segmentation

Abstract

Results

Related Papers