Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

2023-12-20Denoising Depth Estimation Monocular Depth Estimation

Abstract

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.953	DMD
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.989	DMD
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	0.996	DMD
Depth Estimation	NYU-Depth V2	RMSE	0.296	DMD
Depth Estimation	NYU-Depth V2	absolute relative error	0.072	DMD
Depth Estimation	NYU-Depth V2	log 10	0.031	DMD
3D	NYU-Depth V2	Delta < 1.25	0.953	DMD
3D	NYU-Depth V2	Delta < 1.25^2	0.989	DMD
3D	NYU-Depth V2	Delta < 1.25^3	0.996	DMD
3D	NYU-Depth V2	RMSE	0.296	DMD
3D	NYU-Depth V2	absolute relative error	0.072	DMD
3D	NYU-Depth V2	log 10	0.031	DMD

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Abstract

Results

Related Papers

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Abstract

Results

Related Papers