TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-Shot Metric Depth with a Field-of-View Conditioned Di...

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

2023-12-20DenoisingDepth EstimationMonocular Depth Estimation
PaperPDF

Abstract

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.953DMD
Depth EstimationNYU-Depth V2Delta < 1.25^20.989DMD
Depth EstimationNYU-Depth V2Delta < 1.25^30.996DMD
Depth EstimationNYU-Depth V2RMSE0.296DMD
Depth EstimationNYU-Depth V2absolute relative error0.072DMD
Depth EstimationNYU-Depth V2log 100.031DMD
3DNYU-Depth V2Delta < 1.250.953DMD
3DNYU-Depth V2Delta < 1.25^20.989DMD
3DNYU-Depth V2Delta < 1.25^30.996DMD
3DNYU-Depth V2RMSE0.296DMD
3DNYU-Depth V2absolute relative error0.072DMD
3DNYU-Depth V2log 100.031DMD

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15