ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

2020-12-09CVPR 2021 1Panoptic Segmentation Video Panoptic Segmentation Segmentation Depth Estimation Depth-aware Video Panoptic Segmentation Monocular Depth Estimation

Paper PDF Code(official)Code

Abstract

In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective image sequences while providing each point with instance-level semantic interpretations. Solving this problem requires the vision models to predict the spatial location, semantic class, and temporally consistent instance label for each 3D point. ViP-DeepLab approaches it by jointly performing monocular depth estimation and video panoptic segmentation. We name this joint task as Depth-aware Video Panoptic Segmentation, and propose a new evaluation metric along with two derived datasets for it, which will be made available to the public. On the individual sub-tasks, ViP-DeepLab also achieves state-of-the-art results, outperforming previous methods by 5.1% VPQ on Cityscapes-VPS, ranking 1st on the KITTI monocular depth estimation benchmark, and 1st on KITTI MOTS pedestrian. The datasets and the evaluation codes are made publicly available.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes-VPS	VPQ	63.1	VIP-Deeplab
Semantic Segmentation	Cityscapes-VPS	VPQ (stuff)	73	VIP-Deeplab
Semantic Segmentation	Cityscapes-VPS	VPQ (thing)	49.5	VIP-Deeplab
10-shot image generation	Cityscapes-VPS	VPQ	63.1	VIP-Deeplab
10-shot image generation	Cityscapes-VPS	VPQ (stuff)	73	VIP-Deeplab
10-shot image generation	Cityscapes-VPS	VPQ (thing)	49.5	VIP-Deeplab
Panoptic Segmentation	Cityscapes-VPS	VPQ	63.1	VIP-Deeplab
Panoptic Segmentation	Cityscapes-VPS	VPQ (stuff)	73	VIP-Deeplab
Panoptic Segmentation	Cityscapes-VPS	VPQ (thing)	49.5	VIP-Deeplab

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17