Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wen-Jun Zeng
Recently unsupervised learning of depth from videos has made remarkable progress and the results are comparable to fully supervised methods in outdoor scenes like KITTI. However, there still exist great challenges when directly applying this technology in indoor environments, e.g., large areas of non-texture regions like white wall, more complex ego-motion of handheld camera, transparent glasses and shiny objects. To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions. Our experimental evaluation demonstrates that the result of our method is comparable to fully supervised methods on the NYU Depth V2 benchmark. To the best of our knowledge, this is the first quantitative result of purely unsupervised learning method reported on indoor datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | NYU-Depth V2 self-supervised | Absolute relative error (AbsRel) | 0.208 | Zhou et al |
| Depth Estimation | NYU-Depth V2 self-supervised | Root mean square error (RMSE) | 0.712 | Zhou et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_1 | 67.4 | Zhou et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_2 | 90 | Zhou et al |
| Depth Estimation | NYU-Depth V2 self-supervised | delta_3 | 96.8 | Zhou et al |
| 3D | NYU-Depth V2 self-supervised | Absolute relative error (AbsRel) | 0.208 | Zhou et al |
| 3D | NYU-Depth V2 self-supervised | Root mean square error (RMSE) | 0.712 | Zhou et al |
| 3D | NYU-Depth V2 self-supervised | delta_1 | 67.4 | Zhou et al |
| 3D | NYU-Depth V2 self-supervised | delta_2 | 90 | Zhou et al |
| 3D | NYU-Depth V2 self-supervised | delta_3 | 96.8 | Zhou et al |