TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/H-Net: Unsupervised Attention-based Stereo Depth Estimatio...

H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging Epipolar Geometry

Baoru Huang, Jian-Qing Zheng, Stamatia Giannarou, Daniel S. Elson

2021-04-22Stereo MatchingStereo Depth EstimationDepth PredictionDepth Estimation
PaperPDF

Abstract

Depth estimation from a stereo image pair has become one of the most explored applications in computer vision, with most of the previous methods relying on fully supervised learning settings. However, due to the difficulty in acquiring accurate and scalable ground truth data, the training of fully supervised methods is challenging. As an alternative, self-supervised methods are becoming more popular to mitigate this challenge. In this paper, we introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation that leverages epipolar geometry to refine stereo matching. For the first time, a Siamese autoencoder architecture is used for depth estimation which allows mutual information between the rectified stereo images to be extracted. To enforce the epipolar constraint, the mutual epipolar attention mechanism has been designed which gives more emphasis to correspondences of features which lie on the same epipolar line while learning mutual information between the input stereo pair. Stereo correspondences are further enhanced by incorporating semantic information to the proposed attention mechanism. More specifically, the optimal transport algorithm is used to suppress attention and eliminate outliers in areas not visible in both cameras. Extensive experiments on KITTI2015 and Cityscapes show that our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods while closing the gap with the fully supervised approaches.

Results

TaskDatasetMetricValueModel
Depth EstimationKITTI 2015Absolute relative error (AbsRel)0.076H-Net (Ours) Full Eigen
Depth EstimationKITTI 2015RMSE0.04025H-Net (Ours) Full Eigen
Depth EstimationKITTI 2015Sq Rel0.607H-Net (Ours) Full Eigen
Depth EstimationKITTI 2015Absolute relative error (AbsRel)0.094H-Net (Ours)
Depth EstimationKITTI 2015Sq Rel0.6H-Net (Ours)
3DKITTI 2015Absolute relative error (AbsRel)0.076H-Net (Ours) Full Eigen
3DKITTI 2015RMSE0.04025H-Net (Ours) Full Eigen
3DKITTI 2015Sq Rel0.607H-Net (Ours) Full Eigen
3DKITTI 2015Absolute relative error (AbsRel)0.094H-Net (Ours)
3DKITTI 2015Sq Rel0.6H-Net (Ours)

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11