NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

2023-08-29Segmentation Semantic Segmentation Instance Segmentation Video Instance Segmentation

Abstract

Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this belief, in particular, for challenging and long video sequences. We understand this work as a rebuttal of those recent observations and an appeal to the community to focus on dedicated near-online VIS approaches. To support our argument, we present a detailed analysis on different processing paradigms and the new end-to-end trainable NOVIS (Near-Online Video Instance Segmentation) method. Our transformer-based model directly predicts spatio-temporal mask volumes for clips of frames and performs instance tracking between clips via overlap embeddings. NOVIS represents the first near-online VIS approach which avoids any handcrafted tracking heuristics. We outperform all existing VIS methods by large margins and provide new state-of-the-art results on both YouTube-VIS (2019/2021) and the OVIS benchmarks.

Results

Task	Dataset	Metric	Value	Model
Video Instance Segmentation	YouTube-VIS 2021	AP50	82	NOVIS (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AP75	66.5	NOVIS (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AR1	47.9	NOVIS (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AR10	64.4	NOVIS (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	mask AP	59.8	NOVIS (Swin-L)
Video Instance Segmentation	YouTube-VIS 2021	AP50	69.4	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS 2021	AP75	50	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS 2021	AR1	41.3	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS 2021	AR10	54.4	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS 2021	mask AP	47.2	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AP50	75.7	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AP75	56.9	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR1	50.3	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	AR10	60.6	NOVIS (ResNet-50)
Video Instance Segmentation	YouTube-VIS validation	mask AP	52.8	NOVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	AP50	68.3	NOVIS (Swin-L)
Video Instance Segmentation	OVIS validation	AP75	43.8	NOVIS (Swin-L)
Video Instance Segmentation	OVIS validation	AR1	19.4	NOVIS (Swin-L)
Video Instance Segmentation	OVIS validation	AR10	46.9	NOVIS (Swin-L)
Video Instance Segmentation	OVIS validation	mask AP	43.5	NOVIS (Swin-L)
Video Instance Segmentation	OVIS validation	AP50	56.2	NOVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	AP75	32.6	NOVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	AR1	15.7	NOVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	AR10	37.1	NOVIS (ResNet-50)
Video Instance Segmentation	OVIS validation	mask AP	32.7	NOVIS (ResNet-50)

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Abstract

Results

Related Papers

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Abstract

Results

Related Papers