Sparsity Invariant CNNs

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, Andreas Geiger

2017-08-22Depth Completion Depth Prediction Depth Estimation

Abstract

In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

Results

Task	Dataset	Metric	Value	Model
Depth Completion	KITTI Depth Completion	MAE	481	SparseConvs
Depth Completion	KITTI Depth Completion	RMSE	1601	SparseConvs
Depth Completion	KITTI Depth Completion	Runtime [ms]	10	SparseConvs

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15 Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15 Cameras as Relative Positional Encoding2025-07-14 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11