TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ScaleDepth: Decomposing Metric Depth Estimation into Scale...

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

Ruijie Zhu, Chuxin Wang, Ziyang Song, Li Liu, Tianzhu Zhang, Yongdong Zhang

2024-07-11Depth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Estimating depth from a single image is a challenging visual task. Compared to relative depth estimation, metric depth estimation attracts more attention due to its practical physical significance and critical applications in real-life scenarios. However, existing metric depth estimation methods are typically trained on specific datasets with similar scenes, facing challenges in generalizing across scenes with significant scale variations. To address this challenge, we propose a novel monocular depth estimation method called ScaleDepth. Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction (SASP) module and an adaptive relative depth estimation (ARDE) module, respectively. The proposed ScaleDepth enjoys several merits. First, the SASP module can implicitly combine structural and semantic features of the images to predict precise scene scales. Second, the ARDE module can adaptively estimate the relative depth distribution of each image within a normalized depth space. Third, our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework, without the need for setting the depth range or fine-tuning model. Extensive experiments demonstrate that our method attains state-of-the-art performance across indoor, outdoor, unconstrained, and unseen scenes. Project page: https://ruijiezhu94.github.io/ScaleDepth

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.957ScaleDepth-N
Depth EstimationNYU-Depth V2Delta < 1.25^20.994ScaleDepth-N
Depth EstimationNYU-Depth V2Delta < 1.25^30.999ScaleDepth-N
Depth EstimationNYU-Depth V2RMSE0.267ScaleDepth-N
Depth EstimationNYU-Depth V2absolute relative error0.074ScaleDepth-N
Depth EstimationNYU-Depth V2log 100.032ScaleDepth-N
Depth EstimationIBims-1RMSE0.59ScaleDepth-NK
Depth EstimationIBims-1absolute relative error0.164ScaleDepth-NK
Depth EstimationIBims-1δ1.250.778ScaleDepth-NK
Depth EstimationKITTI Eigen splitDelta < 1.250.98ScaleDepth-K
Depth EstimationKITTI Eigen splitDelta < 1.25^20.998ScaleDepth-K
Depth EstimationKITTI Eigen splitDelta < 1.25^31ScaleDepth-K
Depth EstimationKITTI Eigen splitRMSE1.987ScaleDepth-K
Depth EstimationKITTI Eigen splitRMSE log0.073ScaleDepth-K
Depth EstimationKITTI Eigen splitSq Rel0.136ScaleDepth-K
Depth EstimationKITTI Eigen splitabsolute relative error0.048ScaleDepth-K
Depth EstimationDDADDelta < 1.250.871ScaleDepth-NK
Depth EstimationDDADRMSE6.097ScaleDepth-NK
Depth EstimationDDADabsolute relative error0.121ScaleDepth-NK
Depth EstimationSUN-RGBDDelta < 1.250.866ScaleDepth-NK
Depth EstimationSUN-RGBDRMSE0.359ScaleDepth-NK
Depth EstimationSUN-RGBDabsolute relative error0.129ScaleDepth-NK
Depth EstimationDIODE IndoorDelta < 1.250.447ScaleDepth-NK
Depth EstimationDIODE IndoorRMSE1.443ScaleDepth-NK
Depth EstimationDIODE Indoorabsolute relative error0.355ScaleDepth-NK
Depth EstimationHypersimDelta < 1.250.413ScaleDepth-NK
Depth EstimationHypersimRMSE4.825ScaleDepth-NK
Depth EstimationHypersimabsolute relative error0.381ScaleDepth-NK
Depth EstimationVirtual KITTI 2Delta < 1.250.834ScaleDepth-NK
Depth EstimationVirtual KITTI 2RMSE4.747ScaleDepth-NK
Depth EstimationVirtual KITTI 2absolute relative error0.12ScaleDepth-NK
Depth EstimationDIML OutdoorDelta < 1.250.058ScaleDepth-NK
Depth EstimationDIML OutdoorRMSE4.344ScaleDepth-NK
Depth EstimationDIML Outdoorabsolute relative error1.007ScaleDepth-NK
Depth EstimationDIODE OutdoorDelta < 1.250.262ScaleDepth-NK
Depth EstimationDIODE OutdoorRMSE8.632ScaleDepth-NK
Depth EstimationDIODE Outdoorabsolute relative error0.562ScaleDepth-NK
3DNYU-Depth V2Delta < 1.250.957ScaleDepth-N
3DNYU-Depth V2Delta < 1.25^20.994ScaleDepth-N
3DNYU-Depth V2Delta < 1.25^30.999ScaleDepth-N
3DNYU-Depth V2RMSE0.267ScaleDepth-N
3DNYU-Depth V2absolute relative error0.074ScaleDepth-N
3DNYU-Depth V2log 100.032ScaleDepth-N
3DIBims-1RMSE0.59ScaleDepth-NK
3DIBims-1absolute relative error0.164ScaleDepth-NK
3DIBims-1δ1.250.778ScaleDepth-NK
3DKITTI Eigen splitDelta < 1.250.98ScaleDepth-K
3DKITTI Eigen splitDelta < 1.25^20.998ScaleDepth-K
3DKITTI Eigen splitDelta < 1.25^31ScaleDepth-K
3DKITTI Eigen splitRMSE1.987ScaleDepth-K
3DKITTI Eigen splitRMSE log0.073ScaleDepth-K
3DKITTI Eigen splitSq Rel0.136ScaleDepth-K
3DKITTI Eigen splitabsolute relative error0.048ScaleDepth-K
3DDDADDelta < 1.250.871ScaleDepth-NK
3DDDADRMSE6.097ScaleDepth-NK
3DDDADabsolute relative error0.121ScaleDepth-NK
3DSUN-RGBDDelta < 1.250.866ScaleDepth-NK
3DSUN-RGBDRMSE0.359ScaleDepth-NK
3DSUN-RGBDabsolute relative error0.129ScaleDepth-NK
3DDIODE IndoorDelta < 1.250.447ScaleDepth-NK
3DDIODE IndoorRMSE1.443ScaleDepth-NK
3DDIODE Indoorabsolute relative error0.355ScaleDepth-NK
3DHypersimDelta < 1.250.413ScaleDepth-NK
3DHypersimRMSE4.825ScaleDepth-NK
3DHypersimabsolute relative error0.381ScaleDepth-NK
3DVirtual KITTI 2Delta < 1.250.834ScaleDepth-NK
3DVirtual KITTI 2RMSE4.747ScaleDepth-NK
3DVirtual KITTI 2absolute relative error0.12ScaleDepth-NK
3DDIML OutdoorDelta < 1.250.058ScaleDepth-NK
3DDIML OutdoorRMSE4.344ScaleDepth-NK
3DDIML Outdoorabsolute relative error1.007ScaleDepth-NK
3DDIODE OutdoorDelta < 1.250.262ScaleDepth-NK
3DDIODE OutdoorRMSE8.632ScaleDepth-NK
3DDIODE Outdoorabsolute relative error0.562ScaleDepth-NK

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11