Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov
Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data. The disadvantage, however, is that the photometric reprojection losses used with self-supervised learning typically have multiple local minima. These plausible-looking alternatives to ground truth can restrict what a regression network learns, causing it to predict depth maps of limited quality. As one prominent example, depth discontinuities around thin structures are often incorrectly estimated by current state-of-the-art methods. Here, we study the problem of ambiguous reprojections in depth prediction from stereo-based self-supervision, and introduce Depth Hints to alleviate their effects. Depth Hints are complementary depth suggestions obtained from simple off-the-shelf stereo algorithms. These hints enhance an existing photometric loss function, and are used to guide a network to learn better weights. They require no additional data, and are assumed to be right only sometimes. We show that using our Depth Hints gives a substantial boost when training several leading self-supervised-from-stereo models, not just our own. Further, combined with other good practices, we produce state-of-the-art depth predictions on the KITTI benchmark.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | KITTI Eigen split | absolute relative error | 0.096 | Depth Hints |
| Depth Estimation | VA (Virtual Apartment) | Absolute relative error (AbsRel) | 0.197 | Depth Hints |
| Depth Estimation | VA (Virtual Apartment) | Log root mean square error (RMSE_log) | 0.248 | Depth Hints |
| Depth Estimation | VA (Virtual Apartment) | Mean average error (MAE) | 0.291 | Depth Hints |
| Depth Estimation | VA (Virtual Apartment) | Root mean square error (RMSE) | 0.427 | Depth Hints |
| 3D | KITTI Eigen split | absolute relative error | 0.096 | Depth Hints |
| 3D | VA (Virtual Apartment) | Absolute relative error (AbsRel) | 0.197 | Depth Hints |
| 3D | VA (Virtual Apartment) | Log root mean square error (RMSE_log) | 0.248 | Depth Hints |
| 3D | VA (Virtual Apartment) | Mean average error (MAE) | 0.291 | Depth Hints |
| 3D | VA (Virtual Apartment) | Root mean square error (RMSE) | 0.427 | Depth Hints |