TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SQLdepth: Generalizable Self-Supervised Fine-Structured Mo...

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Youhong Wang, Yunji Liang, Hao Xu, Shaohui Jiao, Hongkai Yu

2023-09-01Zero-shot GeneralizationAutonomous DrivingDepth EstimationMonocular Depth Estimation
PaperPDFCode

Abstract

Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details with limited generalization. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structures from motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. The self-cost volume implicitly captures the intrinsic geometry of the scene within a single frame. Each individual slice of the volume signifies the relative distances between points and objects within a latent space. Ultimately, this volume is compressed to the depth map via a novel decoding approach. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance (AbsRel = $0.082$ on KITTI, $0.052$ on KITTI with improved ground-truth and $0.106$ on Cityscapes), achieves $9.9\%$, $5.5\%$ and $4.5\%$ error reduction from the previous best. In addition, our approach showcases reduced training complexity, computational efficiency, improved generalization, and the ability to recover fine-grained scene details. Moreover, the self-supervised pre-trained and metric fine-tuned SQLdepth can surpass existing supervised methods by significant margins (AbsRel = $0.043$, $14\%$ error reduction). self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code and the pre-trained weights will be publicly available. Code is available at \href{https://github.com/hisfog/SQLdepth-Impl}{https://github.com/hisfog/SQLdepth-Impl}.

Results

TaskDatasetMetricValueModel
Depth EstimationKITTI Eigen splitDelta < 1.250.983SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitDelta < 1.25^20.998SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitDelta < 1.25^30.999SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitRMSE1.698SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitRMSE log0.064SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitSq Rel0.105SQLdepth (ConvNeXt-L)
Depth EstimationKITTI Eigen splitabsolute relative error0.043SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitDelta < 1.250.983SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitDelta < 1.25^20.998SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitDelta < 1.25^30.999SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitRMSE1.698SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitRMSE log0.064SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitSq Rel0.105SQLdepth (ConvNeXt-L)
3DKITTI Eigen splitabsolute relative error0.043SQLdepth (ConvNeXt-L)

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17