DDP: Diffusion Model for Dense Visual Prediction

Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

2023-03-30ICCV 2023 1Denoising Segmentation Semantic Segmentation Prediction Depth Estimation Monocular Depth Estimation

Paper PDF Code(official)

Abstract

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.921	DDP (step3)
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.99	DDP (step3)
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	0.998	DDP (step3)
Depth Estimation	NYU-Depth V2	RMSE	0.329	DDP (step3)
Depth Estimation	NYU-Depth V2	absolute relative error	0.094	DDP (step3)
Depth Estimation	NYU-Depth V2	log 10	0.04	DDP (step3)
Depth Estimation	KITTI Eigen split	Delta < 1.25	0.975	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	Delta < 1.25^2	0.997	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	Delta < 1.25^3	0.999	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	RMSE	2.072	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	RMSE log	0.076	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	Sq Rel	0.148	DDP (Swin-L, step-3)
Depth Estimation	KITTI Eigen split	absolute relative error	0.05	DDP (Swin-L, step-3)
Depth Estimation	SUN-RGBD	Delta < 1.25	0.825	DDP (step-3)
Depth Estimation	SUN-RGBD	Delta < 1.25^2	0.973	DDP (step-3)
Depth Estimation	SUN-RGBD	Delta < 1.25^3	0.994	DDP (step-3)
Depth Estimation	SUN-RGBD	RMSE	0.397	DDP (step-3)
Depth Estimation	SUN-RGBD	absolute relative error	0.128	DDP (step-3)
Depth Estimation	SUN-RGBD	log 10	0.056	DDP (step-3)
Semantic Segmentation	Cityscapes val	mIoU	83.9	DDP (ConvNeXt-L, step-3)
Semantic Segmentation	ADE20K	Params (M)	207	DDP (Swin-L, step-3)
Semantic Segmentation	ADE20K	Validation mIoU	54.4	DDP (Swin-L, step-3)
3D	NYU-Depth V2	Delta < 1.25	0.921	DDP (step3)
3D	NYU-Depth V2	Delta < 1.25^2	0.99	DDP (step3)
3D	NYU-Depth V2	Delta < 1.25^3	0.998	DDP (step3)
3D	NYU-Depth V2	RMSE	0.329	DDP (step3)
3D	NYU-Depth V2	absolute relative error	0.094	DDP (step3)
3D	NYU-Depth V2	log 10	0.04	DDP (step3)
3D	KITTI Eigen split	Delta < 1.25	0.975	DDP (Swin-L, step-3)
3D	KITTI Eigen split	Delta < 1.25^2	0.997	DDP (Swin-L, step-3)
3D	KITTI Eigen split	Delta < 1.25^3	0.999	DDP (Swin-L, step-3)
3D	KITTI Eigen split	RMSE	2.072	DDP (Swin-L, step-3)
3D	KITTI Eigen split	RMSE log	0.076	DDP (Swin-L, step-3)
3D	KITTI Eigen split	Sq Rel	0.148	DDP (Swin-L, step-3)
3D	KITTI Eigen split	absolute relative error	0.05	DDP (Swin-L, step-3)
3D	SUN-RGBD	Delta < 1.25	0.825	DDP (step-3)
3D	SUN-RGBD	Delta < 1.25^2	0.973	DDP (step-3)
3D	SUN-RGBD	Delta < 1.25^3	0.994	DDP (step-3)
3D	SUN-RGBD	RMSE	0.397	DDP (step-3)
3D	SUN-RGBD	absolute relative error	0.128	DDP (step-3)
3D	SUN-RGBD	log 10	0.056	DDP (step-3)
10-shot image generation	Cityscapes val	mIoU	83.9	DDP (ConvNeXt-L, step-3)
10-shot image generation	ADE20K	Params (M)	207	DDP (Swin-L, step-3)
10-shot image generation	ADE20K	Validation mIoU	54.4	DDP (Swin-L, step-3)

DDP: Diffusion Model for Dense Visual Prediction

Abstract

Results

Related Papers

DDP: Diffusion Model for Dense Visual Prediction

Abstract

Results

Related Papers