StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction

Sameh Khamis, Sean Fanello, Christoph Rhemann, Adarsh Kowdle, Julien Valentin, Shahram Izadi

2018-07-24ECCV 2018 9Stereo Matching Stereo Matching Hand Stereo Depth Estimation Quantization Depth Prediction Depth Estimation

Paper PDF Code Code

Abstract

This paper presents StereoNet, the first end-to-end deep architecture for real-time stereo matching that runs at 60 fps on an NVidia Titan X, producing high-quality, edge-preserved, quantization-free disparity maps. A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. This allows us to achieve real-time performance by using a very low resolution cost volume that encodes all the information needed to achieve high disparity precision. Spatial precision is achieved by employing a learned edge-aware upsampling function. Our model uses a Siamese network to extract features from the left and right image. A first estimate of the disparity is computed in a very low resolution cost volume, then hierarchically the model re-introduces high-frequency details through a learned upsampling function that uses compact pixel-to-pixel refinement networks. Leveraging color input as a guide, this function is capable of producing high-quality edge-aware output. We achieve compelling results on multiple benchmarks, showing how the proposed method offers extreme flexibility at an acceptable computational budget.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	sceneflow	Average End-Point Error	1.1	stereonet
3D	sceneflow	Average End-Point Error	1.1	stereonet
Stereo Depth Estimation	sceneflow	Average End-Point Error	1.1	stereonet

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16