TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Digging Into Self-Supervised Monocular Depth Estimation

Digging Into Self-Supervised Monocular Depth Estimation

Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow

2018-06-04Motion EstimationSelf-Supervised LearningImage ReconstructionUnsupervised Monocular Depth EstimationScene UnderstandingCamera Pose EstimationDepth EstimationMonocular Depth Estimation
PaperPDFCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCode

Abstract

Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.

Results

TaskDatasetMetricValueModel
Depth EstimationKITTI Eigen splitabsolute relative error0.106monodepth2 M
Depth EstimationMid-Air DatasetAbs Rel0.717Monodepth2
Depth EstimationMid-Air DatasetRMSE74.552Monodepth2
Depth EstimationMid-Air DatasetRMSE log0.882Monodepth2
Depth EstimationMid-Air DatasetSQ Rel37.164Monodepth2
Depth EstimationVA (Virtual Apartment)Absolute relative error (AbsRel)0.203MonoDepth2
Depth EstimationVA (Virtual Apartment)Log root mean square error (RMSE_log)0.251MonoDepth2
Depth EstimationVA (Virtual Apartment)Mean average error (MAE) 0.295MonoDepth2
Depth EstimationVA (Virtual Apartment)Root mean square error (RMSE)0.432MonoDepth2
Depth EstimationMake3DAbs Rel0.322Monodepth2
Depth EstimationMake3DRMSE7.417Monodepth2
Depth EstimationMake3DSq Rel3.589Monodepth2
3DKITTI Eigen splitabsolute relative error0.106monodepth2 M
3DMid-Air DatasetAbs Rel0.717Monodepth2
3DMid-Air DatasetRMSE74.552Monodepth2
3DMid-Air DatasetRMSE log0.882Monodepth2
3DMid-Air DatasetSQ Rel37.164Monodepth2
3DVA (Virtual Apartment)Absolute relative error (AbsRel)0.203MonoDepth2
3DVA (Virtual Apartment)Log root mean square error (RMSE_log)0.251MonoDepth2
3DVA (Virtual Apartment)Mean average error (MAE) 0.295MonoDepth2
3DVA (Virtual Apartment)Root mean square error (RMSE)0.432MonoDepth2
3DMake3DAbs Rel0.322Monodepth2
3DMake3DRMSE7.417Monodepth2
3DMake3DSq Rel3.589Monodepth2
Camera Pose EstimationKITTI Odometry BenchmarkAbsolute Trajectory Error [m]93.04Monodepth2
Camera Pose EstimationKITTI Odometry BenchmarkAverage Rotational Error er[%]20.72Monodepth2
Camera Pose EstimationKITTI Odometry BenchmarkAverage Translational Error et[%]43.21Monodepth2

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16