Doyeon Kim, Woonghyun Ka, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim
Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | NYU-Depth V2 | Delta < 1.25 | 0.915 | GLPDepth |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^2 | 0.988 | GLPDepth |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^3 | 0.997 | GLPDepth |
| Depth Estimation | NYU-Depth V2 | RMSE | 0.344 | GLPDepth |
| Depth Estimation | NYU-Depth V2 | absolute relative error | 0.098 | GLPDepth |
| Depth Estimation | NYU-Depth V2 | log 10 | 0.042 | GLPDepth |
| Depth Estimation | KITTI Eigen split | Delta < 1.25 | 0.967 | GLPDepth |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^2 | 0.996 | GLPDepth |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^3 | 0.999 | GLPDepth |
| Depth Estimation | KITTI Eigen split | RMSE | 2.297 | GLPDepth |
| Depth Estimation | KITTI Eigen split | RMSE log | 0.086 | GLPDepth |
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.057 | GLPDepth |
| 3D | NYU-Depth V2 | Delta < 1.25 | 0.915 | GLPDepth |
| 3D | NYU-Depth V2 | Delta < 1.25^2 | 0.988 | GLPDepth |
| 3D | NYU-Depth V2 | Delta < 1.25^3 | 0.997 | GLPDepth |
| 3D | NYU-Depth V2 | RMSE | 0.344 | GLPDepth |
| 3D | NYU-Depth V2 | absolute relative error | 0.098 | GLPDepth |
| 3D | NYU-Depth V2 | log 10 | 0.042 | GLPDepth |
| 3D | KITTI Eigen split | Delta < 1.25 | 0.967 | GLPDepth |
| 3D | KITTI Eigen split | Delta < 1.25^2 | 0.996 | GLPDepth |
| 3D | KITTI Eigen split | Delta < 1.25^3 | 0.999 | GLPDepth |
| 3D | KITTI Eigen split | RMSE | 2.297 | GLPDepth |
| 3D | KITTI Eigen split | RMSE log | 0.086 | GLPDepth |
| 3D | KITTI Eigen split | absolute relative error | 0.057 | GLPDepth |