TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Repurposing Diffusion-Based Image Generators for Monocular...

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler

2023-12-04CVPR 2024 1Zero-shot GeneralizationScene UnderstandingDepth EstimationMonocular Depth Estimation
PaperPDFCodeCodeCode(official)Code

Abstract

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.964Marigold
Depth EstimationNYU-Depth V2Delta < 1.25^20.991Marigold
Depth EstimationNYU-Depth V2Delta < 1.25^30.998Marigold
Depth EstimationNYU-Depth V2RMSE0.224Marigold
Depth EstimationNYU-Depth V2absolute relative error0.055Marigold
Depth EstimationNYU-Depth V2log 100.024Marigold
Depth EstimationETH3DDelta < 1.250.096Marigold
Depth EstimationETH3Dabsolute relative error0.065Marigold
Depth EstimationKITTI Eigen splitDelta < 1.250.916Marigold
Depth EstimationKITTI Eigen splitDelta < 1.25^20.987Marigold
Depth EstimationKITTI Eigen splitDelta < 1.25^30.996Marigold
Depth EstimationKITTI Eigen splitRMSE3.304Marigold
Depth EstimationKITTI Eigen splitRMSE log0.138Marigold
Depth EstimationKITTI Eigen splitabsolute relative error0.099Marigold
3DNYU-Depth V2Delta < 1.250.964Marigold
3DNYU-Depth V2Delta < 1.25^20.991Marigold
3DNYU-Depth V2Delta < 1.25^30.998Marigold
3DNYU-Depth V2RMSE0.224Marigold
3DNYU-Depth V2absolute relative error0.055Marigold
3DNYU-Depth V2log 100.024Marigold
3DETH3DDelta < 1.250.096Marigold
3DETH3Dabsolute relative error0.065Marigold
3DKITTI Eigen splitDelta < 1.250.916Marigold
3DKITTI Eigen splitDelta < 1.25^20.987Marigold
3DKITTI Eigen splitDelta < 1.25^30.996Marigold
3DKITTI Eigen splitRMSE3.304Marigold
3DKITTI Eigen splitRMSE log0.138Marigold
3DKITTI Eigen splitabsolute relative error0.099Marigold

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16