Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments

Daniel Seichter, Söhnke Benedikt Fischedick, Mona Köhler, Horst-Michael Groß

2022-07-10Scene Classification Panoptic Segmentation Scene Understanding Segmentation Scene Classification (unified classes)Semantic Segmentation Instance Segmentation

Paper PDF Code(official)Code

Abstract

Semantic scene understanding is essential for mobile agents acting in various environments. Although semantic segmentation already provides a lot of information, details about individual objects as well as the general scene are missing but required for many real-world applications. However, solving multiple tasks separately is expensive and cannot be accomplished in real time given limited computing and battery capabilities on a mobile platform. In this paper, we propose an efficient multi-task approach for RGB-D scene analysis~(EMSANet) that simultaneously performs semantic and instance segmentation~(panoptic segmentation), instance orientation estimation, and scene classification. We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance - by contrast, the individual tasks are able to benefit from each other. In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation. To the best of our knowledge, we are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	NYU Depth v2	PQ	47.38	EMSANet
Semantic Segmentation	SUN-RGBD	PQ	52.84	EMSANet
10-shot image generation	NYU Depth v2	PQ	47.38	EMSANet
10-shot image generation	SUN-RGBD	PQ	52.84	EMSANet
Panoptic Segmentation	NYU Depth v2	PQ	47.38	EMSANet
Panoptic Segmentation	SUN-RGBD	PQ	52.84	EMSANet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17