TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ECoDepth: Effective Conditioning of Diffusion Models for M...

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Suraj Patni, Aradhye Agarwal, Chetan Arora

2024-03-27CVPR 2024 1Depth PredictionDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.978ECoDepth
Depth EstimationNYU-Depth V2Delta < 1.25^20.997ECoDepth
Depth EstimationNYU-Depth V2Delta < 1.25^30.999ECoDepth
Depth EstimationNYU-Depth V2RMSE0.218ECoDepth
Depth EstimationNYU-Depth V2absolute relative error0.059ECoDepth
Depth EstimationNYU-Depth V2log 100.026ECoDepth
Depth EstimationKITTI Eigen splitDelta < 1.250.979ECoDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^20.998ECoDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^31ECoDepth
Depth EstimationKITTI Eigen splitRMSE1.966ECoDepth
Depth EstimationKITTI Eigen splitRMSE log0.074ECoDepth
Depth EstimationKITTI Eigen splitSq Rel0.139ECoDepth
Depth EstimationKITTI Eigen splitabsolute relative error0.048ECoDepth
3DNYU-Depth V2Delta < 1.250.978ECoDepth
3DNYU-Depth V2Delta < 1.25^20.997ECoDepth
3DNYU-Depth V2Delta < 1.25^30.999ECoDepth
3DNYU-Depth V2RMSE0.218ECoDepth
3DNYU-Depth V2absolute relative error0.059ECoDepth
3DNYU-Depth V2log 100.026ECoDepth
3DKITTI Eigen splitDelta < 1.250.979ECoDepth
3DKITTI Eigen splitDelta < 1.25^20.998ECoDepth
3DKITTI Eigen splitDelta < 1.25^31ECoDepth
3DKITTI Eigen splitRMSE1.966ECoDepth
3DKITTI Eigen splitRMSE log0.074ECoDepth
3DKITTI Eigen splitSq Rel0.139ECoDepth
3DKITTI Eigen splitabsolute relative error0.048ECoDepth

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11