Shariq Farooq Bhat, Ibraheem Alhashim, Peter Wonka
We address the problem of estimating a high quality dense depth map from a single RGB input image. We start out with a baseline encoder-decoder convolutional neural network architecture and pose the question of how the global processing of information can help improve overall depth estimation. To this end, we propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated as linear combinations of the bin centers. We call our new building block AdaBins. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics. We also validate the effectiveness of the proposed block with an ablation study and provide the code and corresponding pre-trained weights of the new state-of-the-art model.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | NYU-Depth V2 | RMS | 0.364 | AdaBins |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25 | 0.903 | AdaBins |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^2 | 0.984 | AdaBins |
| Depth Estimation | NYU-Depth V2 | Delta < 1.25^3 | 0.997 | AdaBins |
| Depth Estimation | NYU-Depth V2 | RMSE | 0.364 | AdaBins |
| Depth Estimation | NYU-Depth V2 | absolute relative error | 0.103 | AdaBins |
| Depth Estimation | NYU-Depth V2 | log 10 | 0.044 | AdaBins |
| Depth Estimation | KITTI Eigen split | Delta < 1.25 | 0.964 | AdaBins |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^2 | 0.995 | AdaBins |
| Depth Estimation | KITTI Eigen split | Delta < 1.25^3 | 0.999 | AdaBins |
| Depth Estimation | KITTI Eigen split | RMSE | 2.36 | AdaBins |
| Depth Estimation | KITTI Eigen split | RMSE log | 0.088 | AdaBins |
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.058 | AdaBins |
| 3D | NYU-Depth V2 | RMS | 0.364 | AdaBins |
| 3D | NYU-Depth V2 | Delta < 1.25 | 0.903 | AdaBins |
| 3D | NYU-Depth V2 | Delta < 1.25^2 | 0.984 | AdaBins |
| 3D | NYU-Depth V2 | Delta < 1.25^3 | 0.997 | AdaBins |
| 3D | NYU-Depth V2 | RMSE | 0.364 | AdaBins |
| 3D | NYU-Depth V2 | absolute relative error | 0.103 | AdaBins |
| 3D | NYU-Depth V2 | log 10 | 0.044 | AdaBins |
| 3D | KITTI Eigen split | Delta < 1.25 | 0.964 | AdaBins |
| 3D | KITTI Eigen split | Delta < 1.25^2 | 0.995 | AdaBins |
| 3D | KITTI Eigen split | Delta < 1.25^3 | 0.999 | AdaBins |
| 3D | KITTI Eigen split | RMSE | 2.36 | AdaBins |
| 3D | KITTI Eigen split | RMSE log | 0.088 | AdaBins |
| 3D | KITTI Eigen split | absolute relative error | 0.058 | AdaBins |