NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis

Zuria Bauer, Zuoyue Li, Sergio Orts-Escolano, Miguel Cazorla, Marc Pollefeys, Martin R. Oswald

2021-12-22Novel View Synthesis Depth Prediction Depth Estimation Image Generation Monocular Depth Estimation

Abstract

Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of the warped RGB image. The output of this network is required to look as similar as possible to the ground-truth view by minimizing the pixel-wise RGB reconstruction error. Third, we reapply the same monocular depth estimation onto the synthesized second view point and ensure that the depth predictions are consistent with the associated ground truth depth. Experimental results prove that our method achieves state-of-the-art or comparable performance on the KITTI and NYU-Depth-v2 datasets with a lightweight and simple vanilla U-Net architecture.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	RMSE	0.331	NVS-MonoDepth
Depth Estimation	KITTI Eigen split	RMSE	2.702	NVS-MonoDepth
Depth Estimation	KITTI Eigen split	absolute relative error	0.057	NVS-MonoDepth
3D	NYU-Depth V2	RMSE	0.331	NVS-MonoDepth
3D	KITTI Eigen split	RMSE	2.702	NVS-MonoDepth
3D	KITTI Eigen split	absolute relative error	0.057	NVS-MonoDepth

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17 FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16