In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

2022-07-21Segmentation Semantic Segmentation Video Object Segmentation Contrastive Learning Instance Segmentation Video Semantic Segmentation Video Instance Segmentation

Paper PDF Code Code(official)

Abstract

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. Despite its simplicity, our method outperforms all online and offline methods on three benchmarks. Specifically, we achieve 49.5 AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on OVIS, a more challenging dataset with significant crowding and occlusions, surpassing the prior art by 14.8 AP. The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of our method, as well as our insight into current methods, could shed light on the exploration of VIS models.

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS 2021	AP50	80.8	IDOL (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AP75	63.5	IDOL (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AR1	45	IDOL (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AR10	60.1	IDOL (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	mask AP	56.1	IDOL (Swin-L)
Video Instance Segmentation	YouTube-VIS validation	AP50	74	IDOL (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AP75	52.9	IDOL (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR1	47.7	IDOL (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR10	58.7	IDOL (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	mask AP	49.5	IDOL (ResNet-50)
Video Instance Segmentation	OVIS validation	AP50	65.7	IDOL (Swin-L)
Video Instance Segmentation	OVIS validation	AP75	45.2	IDOL (Swin-L)
Video Instance Segmentation	OVIS validation	AR1	17.9	IDOL (Swin-L)
Video Instance Segmentation	OVIS validation	AR10	49.6	IDOL (Swin-L)
Video Instance Segmentation	OVIS validation	mask AP	42.6	IDOL (Swin-L)
Video Instance Segmentation	OVIS validation	AP50	51.3	IDOL (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	30	IDOL (ResNet-50)
Video Instance Segmentation	OVIS validation	AR1	15	IDOL (ResNet-50)
Video Instance Segmentation	OVIS validation	AR10	37.5	IDOL (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	30.2	IDOL (ResNet-50)

In Defense of Online Models for Video Instance Segmentation

Abstract

Results

Related Papers

In Defense of Online Models for Video Instance Segmentation

Abstract

Results

Related Papers