Occluded Video Instance Segmentation: A Benchmark

Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

2021-02-02Segmentation Semantic Segmentation Instance Segmentation Video Understanding Video Instance Segmentation

Paper PDF Code Code

Abstract

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion. Built upon MaskTrack R-CNN and SipMask, we obtain a remarkable AP improvement on the OVIS dataset. The OVIS dataset and project code are available at http://songbai.site/ovis .

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS validation	AP50	55.6	CSipMask
Video Instance Segmentation	YouTube-VIS validation	AP75	38.1	CSipMask
Video Instance Segmentation	YouTube-VIS validation	mask AP	35.1	CSipMask
Video Instance Segmentation	YouTube-VIS validation	AP50	52.8	CMaskTrack R-CNN
Video Instance Segmentation	YouTube-VIS validation	AP75	34.9	CMaskTrack R-CNN
Video Instance Segmentation	YouTube-VIS validation	mask AP	32.1	CMaskTrack R-CNN
Video Instance Segmentation	OVIS validation	AP50	33.9	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	13.1	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	APho	4.1	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	APmo	18.7	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	APso	28.6	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	15.4	CMaskTrack R-CNN (ResNet-50)
Video Instance Segmentation	OVIS validation	AP50	29.9	CSipMask (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	12.5	CSipMask (ResNet-50)
Video Instance Segmentation	OVIS validation	APho	2.7	CSipMask (ResNet-50)
Video Instance Segmentation	OVIS validation	APmo	12.8	CSipMask (ResNet-50)
Video Instance Segmentation	OVIS validation	APso	23	CSipMask (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	14.3	CSipMask (ResNet-50)

Occluded Video Instance Segmentation: A Benchmark

Abstract

Results

Related Papers

Occluded Video Instance Segmentation: A Benchmark

Abstract

Results

Related Papers