Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask to automatically localize moving objects that violate the underlying static scene assumption and cause noisy signals during training; (iii) we demonstrate the efficacy of each component with a detailed ablation study and show high-quality depth estimation results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system for more robust and accurate tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training. Finally, we provide several demos for qualitative evaluation.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | KITTI Eigen split | Delta < 1.25 | 0.873 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^2 | 0.96 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^3 | 0.982 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | RMSE | 4.706 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | RMSE log | 0.191 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.114 | SC-Depth (ResNet 50) |
| Depth Estimation | KITTI Eigen split | Delta < 1.25 | 0.863 | SC-Depth (ResNet18) |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^2 | 0.957 | SC-Depth (ResNet18) |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^3 | 0.981 | SC-Depth (ResNet18) |
| Depth Estimation | KITTI Eigen split | RMSE | 4.95 | SC-Depth (ResNet18) |
| Depth Estimation | KITTI Eigen split | RMSE log | 0.197 | SC-Depth (ResNet18) |
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.119 | SC-Depth (ResNet18) |
| Depth Estimation | NYU-Depth V2 self-supervised | Absolute relative error (AbsRel) | 0.157 | Bian et al |
| Depth Estimation | NYU-Depth V2 self-supervised | Root mean square error (RMSE) | 0.593 | Bian et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_1 | 78 | Bian et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_2 | 94 | Bian et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_3 | 98.4 | Bian et al |
| 3D | KITTI Eigen split | Delta < 1.25 | 0.873 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | Delta < 1.25^2 | 0.96 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | Delta < 1.25^3 | 0.982 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | RMSE | 4.706 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | RMSE log | 0.191 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | absolute relative error | 0.114 | SC-Depth (ResNet 50) |
| 3D | KITTI Eigen split | Delta < 1.25 | 0.863 | SC-Depth (ResNet18) |
| 3D | KITTI Eigen split | Delta < 1.25^2 | 0.957 | SC-Depth (ResNet18) |
| 3D | KITTI Eigen split | Delta < 1.25^3 | 0.981 | SC-Depth (ResNet18) |
| 3D | KITTI Eigen split | RMSE | 4.95 | SC-Depth (ResNet18) |
| 3D | KITTI Eigen split | RMSE log | 0.197 | SC-Depth (ResNet18) |
| 3D | KITTI Eigen split | absolute relative error | 0.119 | SC-Depth (ResNet18) |
| 3D | NYU-Depth V2 self-supervised | Absolute relative error (AbsRel) | 0.157 | Bian et al |
| 3D | NYU-Depth V2 self-supervised | Root mean square error (RMSE) | 0.593 | Bian et al |
| 3D | NYU-Depth V2 self-supervised | delta_1 | 78 | Bian et al |
| 3D | NYU-Depth V2 self-supervised | delta_2 | 94 | Bian et al |
| 3D | NYU-Depth V2 self-supervised | delta_3 | 98.4 | Bian et al |