ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Suraj Patni, Aradhye Agarwal, Chetan Arora

2024-03-27CVPR 2024 1Depth Prediction Depth Estimation Monocular Depth Estimation

Abstract

In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.978	ECoDepth
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.997	ECoDepth
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	0.999	ECoDepth
Depth Estimation	NYU-Depth V2	RMSE	0.218	ECoDepth
Depth Estimation	NYU-Depth V2	absolute relative error	0.059	ECoDepth
Depth Estimation	NYU-Depth V2	log 10	0.026	ECoDepth
Depth Estimation	KITTI Eigen split	Delta < 1.25	0.979	ECoDepth
Depth Estimation	KITTI Eigen split	Delta < 1.25^2	0.998	ECoDepth
Depth Estimation	KITTI Eigen split	Delta < 1.25^3	1	ECoDepth
Depth Estimation	KITTI Eigen split	RMSE	1.966	ECoDepth
Depth Estimation	KITTI Eigen split	RMSE log	0.074	ECoDepth
Depth Estimation	KITTI Eigen split	Sq Rel	0.139	ECoDepth
Depth Estimation	KITTI Eigen split	absolute relative error	0.048	ECoDepth
3D	NYU-Depth V2	Delta < 1.25	0.978	ECoDepth
3D	NYU-Depth V2	Delta < 1.25^2	0.997	ECoDepth
3D	NYU-Depth V2	Delta < 1.25^3	0.999	ECoDepth
3D	NYU-Depth V2	RMSE	0.218	ECoDepth
3D	NYU-Depth V2	absolute relative error	0.059	ECoDepth
3D	NYU-Depth V2	log 10	0.026	ECoDepth
3D	KITTI Eigen split	Delta < 1.25	0.979	ECoDepth
3D	KITTI Eigen split	Delta < 1.25^2	0.998	ECoDepth
3D	KITTI Eigen split	Delta < 1.25^3	1	ECoDepth
3D	KITTI Eigen split	RMSE	1.966	ECoDepth
3D	KITTI Eigen split	RMSE log	0.074	ECoDepth
3D	KITTI Eigen split	Sq Rel	0.139	ECoDepth
3D	KITTI Eigen split	absolute relative error	0.048	ECoDepth

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Abstract

Results

Related Papers

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Abstract

Results

Related Papers