Ruijie Zhu, Chuxin Wang, Ziyang Song, Li Liu, Tianzhu Zhang, Yongdong Zhang
Estimating depth from a single image is a challenging visual task. Compared to relative depth estimation, metric depth estimation attracts more attention due to its practical physical significance and critical applications in real-life scenarios. However, existing metric depth estimation methods are typically trained on specific datasets with similar scenes, facing challenges in generalizing across scenes with significant scale variations. To address this challenge, we propose a novel monocular depth estimation method called ScaleDepth. Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction (SASP) module and an adaptive relative depth estimation (ARDE) module, respectively. The proposed ScaleDepth enjoys several merits. First, the SASP module can implicitly combine structural and semantic features of the images to predict precise scene scales. Second, the ARDE module can adaptively estimate the relative depth distribution of each image within a normalized depth space. Third, our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework, without the need for setting the depth range or fine-tuning model. Extensive experiments demonstrate that our method attains state-of-the-art performance across indoor, outdoor, unconstrained, and unseen scenes. Project page: https://ruijiezhu94.github.io/ScaleDepth
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | NYU-Depth V2 | Delta < 1.25 | 0.957 | ScaleDepth-N |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^2 | 0.994 | ScaleDepth-N |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^3 | 0.999 | ScaleDepth-N |
| Depth Estimation | NYU-Depth V2 | RMSE | 0.267 | ScaleDepth-N |
| Depth Estimation | NYU-Depth V2 | absolute relative error | 0.074 | ScaleDepth-N |
| Depth Estimation | NYU-Depth V2 | log 10 | 0.032 | ScaleDepth-N |
| Depth Estimation | IBims-1 | RMSE | 0.59 | ScaleDepth-NK |
| Depth Estimation | IBims-1 | absolute relative error | 0.164 | ScaleDepth-NK |
| Depth Estimation | IBims-1 | δ1.25 | 0.778 | ScaleDepth-NK |
| Depth Estimation | KITTI Eigen split | Delta < 1.25 | 0.98 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^2 | 0.998 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^3 | 1 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | RMSE | 1.987 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | RMSE log | 0.073 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | Sq Rel | 0.136 | ScaleDepth-K |
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.048 | ScaleDepth-K |
| Depth Estimation | DDAD | Delta < 1.25 | 0.871 | ScaleDepth-NK |
| Depth Estimation | DDAD | RMSE | 6.097 | ScaleDepth-NK |
| Depth Estimation | DDAD | absolute relative error | 0.121 | ScaleDepth-NK |
| Depth Estimation | SUN-RGBD | Delta < 1.25 | 0.866 | ScaleDepth-NK |
| Depth Estimation | SUN-RGBD | RMSE | 0.359 | ScaleDepth-NK |
| Depth Estimation | SUN-RGBD | absolute relative error | 0.129 | ScaleDepth-NK |
| Depth Estimation | DIODE Indoor | Delta < 1.25 | 0.447 | ScaleDepth-NK |
| Depth Estimation | DIODE Indoor | RMSE | 1.443 | ScaleDepth-NK |
| Depth Estimation | DIODE Indoor | absolute relative error | 0.355 | ScaleDepth-NK |
| Depth Estimation | Hypersim | Delta < 1.25 | 0.413 | ScaleDepth-NK |
| Depth Estimation | Hypersim | RMSE | 4.825 | ScaleDepth-NK |
| Depth Estimation | Hypersim | absolute relative error | 0.381 | ScaleDepth-NK |
| Depth Estimation | Virtual KITTI 2 | Delta < 1.25 | 0.834 | ScaleDepth-NK |
| Depth Estimation | Virtual KITTI 2 | RMSE | 4.747 | ScaleDepth-NK |
| Depth Estimation | Virtual KITTI 2 | absolute relative error | 0.12 | ScaleDepth-NK |
| Depth Estimation | DIML Outdoor | Delta < 1.25 | 0.058 | ScaleDepth-NK |
| Depth Estimation | DIML Outdoor | RMSE | 4.344 | ScaleDepth-NK |
| Depth Estimation | DIML Outdoor | absolute relative error | 1.007 | ScaleDepth-NK |
| Depth Estimation | DIODE Outdoor | Delta < 1.25 | 0.262 | ScaleDepth-NK |
| Depth Estimation | DIODE Outdoor | RMSE | 8.632 | ScaleDepth-NK |
| Depth Estimation | DIODE Outdoor | absolute relative error | 0.562 | ScaleDepth-NK |
| 3D | NYU-Depth V2 | Delta < 1.25 | 0.957 | ScaleDepth-N |
| 3D | NYU-Depth V2 | Delta < 1.25^2 | 0.994 | ScaleDepth-N |
| 3D | NYU-Depth V2 | Delta < 1.25^3 | 0.999 | ScaleDepth-N |
| 3D | NYU-Depth V2 | RMSE | 0.267 | ScaleDepth-N |
| 3D | NYU-Depth V2 | absolute relative error | 0.074 | ScaleDepth-N |
| 3D | NYU-Depth V2 | log 10 | 0.032 | ScaleDepth-N |
| 3D | IBims-1 | RMSE | 0.59 | ScaleDepth-NK |
| 3D | IBims-1 | absolute relative error | 0.164 | ScaleDepth-NK |
| 3D | IBims-1 | δ1.25 | 0.778 | ScaleDepth-NK |
| 3D | KITTI Eigen split | Delta < 1.25 | 0.98 | ScaleDepth-K |
| 3D | KITTI Eigen split | Delta < 1.25^2 | 0.998 | ScaleDepth-K |
| 3D | KITTI Eigen split | Delta < 1.25^3 | 1 | ScaleDepth-K |
| 3D | KITTI Eigen split | RMSE | 1.987 | ScaleDepth-K |
| 3D | KITTI Eigen split | RMSE log | 0.073 | ScaleDepth-K |
| 3D | KITTI Eigen split | Sq Rel | 0.136 | ScaleDepth-K |
| 3D | KITTI Eigen split | absolute relative error | 0.048 | ScaleDepth-K |
| 3D | DDAD | Delta < 1.25 | 0.871 | ScaleDepth-NK |
| 3D | DDAD | RMSE | 6.097 | ScaleDepth-NK |
| 3D | DDAD | absolute relative error | 0.121 | ScaleDepth-NK |
| 3D | SUN-RGBD | Delta < 1.25 | 0.866 | ScaleDepth-NK |
| 3D | SUN-RGBD | RMSE | 0.359 | ScaleDepth-NK |
| 3D | SUN-RGBD | absolute relative error | 0.129 | ScaleDepth-NK |
| 3D | DIODE Indoor | Delta < 1.25 | 0.447 | ScaleDepth-NK |
| 3D | DIODE Indoor | RMSE | 1.443 | ScaleDepth-NK |
| 3D | DIODE Indoor | absolute relative error | 0.355 | ScaleDepth-NK |
| 3D | Hypersim | Delta < 1.25 | 0.413 | ScaleDepth-NK |
| 3D | Hypersim | RMSE | 4.825 | ScaleDepth-NK |
| 3D | Hypersim | absolute relative error | 0.381 | ScaleDepth-NK |
| 3D | Virtual KITTI 2 | Delta < 1.25 | 0.834 | ScaleDepth-NK |
| 3D | Virtual KITTI 2 | RMSE | 4.747 | ScaleDepth-NK |
| 3D | Virtual KITTI 2 | absolute relative error | 0.12 | ScaleDepth-NK |
| 3D | DIML Outdoor | Delta < 1.25 | 0.058 | ScaleDepth-NK |
| 3D | DIML Outdoor | RMSE | 4.344 | ScaleDepth-NK |
| 3D | DIML Outdoor | absolute relative error | 1.007 | ScaleDepth-NK |
| 3D | DIODE Outdoor | Delta < 1.25 | 0.262 | ScaleDepth-NK |
| 3D | DIODE Outdoor | RMSE | 8.632 | ScaleDepth-NK |
| 3D | DIODE Outdoor | absolute relative error | 0.562 | ScaleDepth-NK |