TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Metric3D: Towards Zero-shot Metric 3D Prediction from A Si...

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

2023-07-20ICCV 2023 1Zero-shot GeneralizationImage ReconstructionDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at https://github.com/YvanYin/Metric3D.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.944Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationNYU-Depth V2Delta < 1.25^20.986Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationNYU-Depth V2Delta < 1.25^30.995Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationNYU-Depth V2RMSE0.31Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationNYU-Depth V2absolute relative error0.083Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationNYU-Depth V2log 100.035Metric3D (ConvNeXt-Large, Zero-shot testing)
Depth EstimationKITTI Eigen splitDelta < 1.250.967Metric3D (zero-shot)
Depth EstimationKITTI Eigen splitDelta < 1.25^20.995Metric3D (zero-shot)
Depth EstimationKITTI Eigen splitDelta < 1.25^30.999Metric3D (zero-shot)
Depth EstimationKITTI Eigen splitRMSE2.77Metric3D (zero-shot)
Depth EstimationKITTI Eigen splitabsolute relative error0.058Metric3D (zero-shot)
3DNYU-Depth V2Delta < 1.250.944Metric3D (ConvNeXt-Large, Zero-shot testing)
3DNYU-Depth V2Delta < 1.25^20.986Metric3D (ConvNeXt-Large, Zero-shot testing)
3DNYU-Depth V2Delta < 1.25^30.995Metric3D (ConvNeXt-Large, Zero-shot testing)
3DNYU-Depth V2RMSE0.31Metric3D (ConvNeXt-Large, Zero-shot testing)
3DNYU-Depth V2absolute relative error0.083Metric3D (ConvNeXt-Large, Zero-shot testing)
3DNYU-Depth V2log 100.035Metric3D (ConvNeXt-Large, Zero-shot testing)
3DKITTI Eigen splitDelta < 1.250.967Metric3D (zero-shot)
3DKITTI Eigen splitDelta < 1.25^20.995Metric3D (zero-shot)
3DKITTI Eigen splitDelta < 1.25^30.999Metric3D (zero-shot)
3DKITTI Eigen splitRMSE2.77Metric3D (zero-shot)
3DKITTI Eigen splitabsolute relative error0.058Metric3D (zero-shot)

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology2025-07-153D Magnetic Inverse Routine for Single-Segment Magnetic Field Images2025-07-15