TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UniDepthV2: Universal Monocular Metric Depth Estimation Ma...

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, Luc van Gool

2025-02-27Depth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains. Departing from the existing MMDE paradigm, UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information, striving for a universal and flexible MMDE solution. In particular, UniDepthV2 implements a self-promptable camera module predicting a dense camera representation to condition depth features. Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations. In addition, we propose a geometric invariance loss that promotes the invariance of camera-prompted depth features. UniDepthV2 improves its predecessor UniDepth model via a new edge-guided loss which enhances the localization and sharpness of edges in the metric depth outputs, a revisited, simplified and more efficient architectural design, and an additional uncertainty-level output which enables downstream tasks requiring confidence. Thorough evaluations on ten depth datasets in a zero-shot regime consistently demonstrate the superior performance and generalization of UniDepthV2. Code and models are available at https://github.com/lpiccinelli-eth/UniDepth

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.988UniDepthV2 (FT, metric)
Depth EstimationNYU-Depth V2Delta < 1.25^20.998UniDepthV2 (FT, metric)
Depth EstimationNYU-Depth V2Delta < 1.25^31UniDepthV2 (FT, metric)
Depth EstimationNYU-Depth V2RMSE0.18UniDepthV2 (FT, metric)
Depth EstimationNYU-Depth V2absolute relative error0.046UniDepthV2 (FT, metric)
Depth EstimationNYU-Depth V2log 100.02UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitDelta < 1.250.989UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitDelta < 1.25^20.998UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitDelta < 1.25^30.999UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitRMSE1.71UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitRMSE log0.061UniDepthV2 (FT, metric)
Depth EstimationKITTI Eigen splitabsolute relative error0.037UniDepthV2 (FT, metric)
3DNYU-Depth V2Delta < 1.250.988UniDepthV2 (FT, metric)
3DNYU-Depth V2Delta < 1.25^20.998UniDepthV2 (FT, metric)
3DNYU-Depth V2Delta < 1.25^31UniDepthV2 (FT, metric)
3DNYU-Depth V2RMSE0.18UniDepthV2 (FT, metric)
3DNYU-Depth V2absolute relative error0.046UniDepthV2 (FT, metric)
3DNYU-Depth V2log 100.02UniDepthV2 (FT, metric)
3DKITTI Eigen splitDelta < 1.250.989UniDepthV2 (FT, metric)
3DKITTI Eigen splitDelta < 1.25^20.998UniDepthV2 (FT, metric)
3DKITTI Eigen splitDelta < 1.25^30.999UniDepthV2 (FT, metric)
3DKITTI Eigen splitRMSE1.71UniDepthV2 (FT, metric)
3DKITTI Eigen splitRMSE log0.061UniDepthV2 (FT, metric)
3DKITTI Eigen splitabsolute relative error0.037UniDepthV2 (FT, metric)

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11