Digging Into Self-Supervised Monocular Depth Estimation

Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow

2018-06-04Motion Estimation Self-Supervised Learning Image Reconstruction Unsupervised Monocular Depth Estimation Scene Understanding Camera Pose Estimation Depth Estimation Monocular Depth Estimation

Paper PDF Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code

Abstract

Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	KITTI Eigen split	absolute relative error	0.106	monodepth2 M
Depth Estimation	Mid-Air Dataset	Abs Rel	0.717	Monodepth2
Depth Estimation	Mid-Air Dataset	RMSE	74.552	Monodepth2
Depth Estimation	Mid-Air Dataset	RMSE log	0.882	Monodepth2
Depth Estimation	Mid-Air Dataset	SQ Rel	37.164	Monodepth2
Depth Estimation	VA (Virtual Apartment)	Absolute relative error (AbsRel)	0.203	MonoDepth2
Depth Estimation	VA (Virtual Apartment)	Log root mean square error (RMSE_log)	0.251	MonoDepth2
Depth Estimation	VA (Virtual Apartment)	Mean average error (MAE)	0.295	MonoDepth2
Depth Estimation	VA (Virtual Apartment)	Root mean square error (RMSE)	0.432	MonoDepth2
Depth Estimation	Make3D	Abs Rel	0.322	Monodepth2
Depth Estimation	Make3D	RMSE	7.417	Monodepth2
Depth Estimation	Make3D	Sq Rel	3.589	Monodepth2
3D	KITTI Eigen split	absolute relative error	0.106	monodepth2 M
3D	Mid-Air Dataset	Abs Rel	0.717	Monodepth2
3D	Mid-Air Dataset	RMSE	74.552	Monodepth2
3D	Mid-Air Dataset	RMSE log	0.882	Monodepth2
3D	Mid-Air Dataset	SQ Rel	37.164	Monodepth2
3D	VA (Virtual Apartment)	Absolute relative error (AbsRel)	0.203	MonoDepth2
3D	VA (Virtual Apartment)	Log root mean square error (RMSE_log)	0.251	MonoDepth2
3D	VA (Virtual Apartment)	Mean average error (MAE)	0.295	MonoDepth2
3D	VA (Virtual Apartment)	Root mean square error (RMSE)	0.432	MonoDepth2
3D	Make3D	Abs Rel	0.322	Monodepth2
3D	Make3D	RMSE	7.417	Monodepth2
3D	Make3D	Sq Rel	3.589	Monodepth2
Camera Pose Estimation	KITTI Odometry Benchmark	Absolute Trajectory Error [m]	93.04	Monodepth2
Camera Pose Estimation	KITTI Odometry Benchmark	Average Rotational Error er[%]	20.72	Monodepth2
Camera Pose Estimation	KITTI Odometry Benchmark	Average Translational Error et[%]	43.21	Monodepth2

Digging Into Self-Supervised Monocular Depth Estimation

Abstract

Results

Related Papers

Digging Into Self-Supervised Monocular Depth Estimation

Abstract

Results

Related Papers