SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion

Danish Nazir, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal

2022-04-28Depth Completion

Abstract

Depth completion involves recovering a dense depth map from a sparse map and an RGB image. Recent approaches focus on utilizing color images as guidance images to recover depth at invalid pixels. However, color images alone are not enough to provide the necessary semantic understanding of the scene. Consequently, the depth completion task suffers from sudden illumination changes in RGB images (e.g., shadows). In this paper, we propose a novel three-branch backbone comprising color-guided, semantic-guided, and depth-guided branches. Specifically, the color-guided branch takes a sparse depth map and RGB image as an input and generates color depth which includes color cues (e.g., object boundaries) of the scene. The predicted dense depth map of color-guided branch along-with semantic image and sparse depth map is passed as input to semantic-guided branch for estimating semantic depth. The depth-guided branch takes sparse, color, and semantic depths to generate the dense depth map. The color depth, semantic depth, and guided depth are adaptively fused to produce the output of our proposed three-branch backbone. In addition, we also propose to apply semantic-aware multi-modal attention-based fusion block (SAMMAFB) to fuse features between all three branches. We further use CSPN++ with Atrous convolutions to refine the dense depth map produced by our three-branch backbone. Extensive experiments show that our model achieves state-of-the-art performance in the KITTI depth completion benchmark at the time of submission.

Results

Task	Dataset	Metric	Value	Model
Depth Completion	KITTI Depth Completion	RMSE	709.41	SemAttNet

Related Papers

PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency2025-07-10 DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation2025-06-26 DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects2025-06-11 SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping2025-05-30 HTMNet: A Hybrid Network with Transformer-Mamba Bottleneck Multimodal Fusion for Transparent and Reflective Objects Depth Completion2025-05-27 BadDepth: Backdoor Attacks Against Monocular Depth Estimation in the Physical World2025-05-22 Event-Driven Dynamic Scene Depth Completion2025-05-19 Depth Anything with Any Prior2025-05-15