TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ZoeDepth: Zero-shot Transfer by Combining Relative and Met...

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

2023-02-23Zero-shot GeneralizationDepth EstimationMonocular Depth Estimation
PaperPDFCodeCodeCodeCodeCode(official)Code

Abstract

This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains. The code and pre-trained models are publicly available at https://github.com/isl-org/ZoeDepth .

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.955ZoeD-M12-N
Depth EstimationNYU-Depth V2Delta < 1.25^20.995ZoeD-M12-N
Depth EstimationNYU-Depth V2Delta < 1.25^30.999ZoeD-M12-N
Depth EstimationNYU-Depth V2RMSE0.27ZoeD-M12-N
Depth EstimationNYU-Depth V2absolute relative error0.075ZoeD-M12-N
Depth EstimationNYU-Depth V2log 100.032ZoeD-M12-N
3DNYU-Depth V2Delta < 1.250.955ZoeD-M12-N
3DNYU-Depth V2Delta < 1.25^20.995ZoeD-M12-N
3DNYU-Depth V2Delta < 1.25^30.999ZoeD-M12-N
3DNYU-Depth V2RMSE0.27ZoeD-M12-N
3DNYU-Depth V2absolute relative error0.075ZoeD-M12-N
3DNYU-Depth V2log 100.032ZoeD-M12-N

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Cameras as Relative Positional Encoding2025-07-14