TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DDP: Diffusion Model for Dense Visual Prediction

DDP: Diffusion Model for Dense Visual Prediction

Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

2023-03-30ICCV 2023 1DenoisingSegmentationSemantic SegmentationPredictionDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.921DDP (step3)
Depth EstimationNYU-Depth V2Delta < 1.25^20.99DDP (step3)
Depth EstimationNYU-Depth V2Delta < 1.25^30.998DDP (step3)
Depth EstimationNYU-Depth V2RMSE0.329DDP (step3)
Depth EstimationNYU-Depth V2absolute relative error0.094DDP (step3)
Depth EstimationNYU-Depth V2log 100.04DDP (step3)
Depth EstimationKITTI Eigen splitDelta < 1.250.975DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitDelta < 1.25^20.997DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitDelta < 1.25^30.999DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitRMSE2.072DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitRMSE log0.076DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitSq Rel0.148DDP (Swin-L, step-3)
Depth EstimationKITTI Eigen splitabsolute relative error0.05DDP (Swin-L, step-3)
Depth EstimationSUN-RGBDDelta < 1.250.825DDP (step-3)
Depth EstimationSUN-RGBDDelta < 1.25^20.973DDP (step-3)
Depth EstimationSUN-RGBDDelta < 1.25^30.994DDP (step-3)
Depth EstimationSUN-RGBDRMSE0.397DDP (step-3)
Depth EstimationSUN-RGBDabsolute relative error0.128DDP (step-3)
Depth EstimationSUN-RGBDlog 100.056DDP (step-3)
Semantic SegmentationCityscapes valmIoU83.9DDP (ConvNeXt-L, step-3)
Semantic SegmentationADE20KParams (M)207DDP (Swin-L, step-3)
Semantic SegmentationADE20KValidation mIoU54.4DDP (Swin-L, step-3)
3DNYU-Depth V2Delta < 1.250.921DDP (step3)
3DNYU-Depth V2Delta < 1.25^20.99DDP (step3)
3DNYU-Depth V2Delta < 1.25^30.998DDP (step3)
3DNYU-Depth V2RMSE0.329DDP (step3)
3DNYU-Depth V2absolute relative error0.094DDP (step3)
3DNYU-Depth V2log 100.04DDP (step3)
3DKITTI Eigen splitDelta < 1.250.975DDP (Swin-L, step-3)
3DKITTI Eigen splitDelta < 1.25^20.997DDP (Swin-L, step-3)
3DKITTI Eigen splitDelta < 1.25^30.999DDP (Swin-L, step-3)
3DKITTI Eigen splitRMSE2.072DDP (Swin-L, step-3)
3DKITTI Eigen splitRMSE log0.076DDP (Swin-L, step-3)
3DKITTI Eigen splitSq Rel0.148DDP (Swin-L, step-3)
3DKITTI Eigen splitabsolute relative error0.05DDP (Swin-L, step-3)
3DSUN-RGBDDelta < 1.250.825DDP (step-3)
3DSUN-RGBDDelta < 1.25^20.973DDP (step-3)
3DSUN-RGBDDelta < 1.25^30.994DDP (step-3)
3DSUN-RGBDRMSE0.397DDP (step-3)
3DSUN-RGBDabsolute relative error0.128DDP (step-3)
3DSUN-RGBDlog 100.056DDP (step-3)
10-shot image generationCityscapes valmIoU83.9DDP (ConvNeXt-L, step-3)
10-shot image generationADE20KParams (M)207DDP (Swin-L, step-3)
10-shot image generationADE20KValidation mIoU54.4DDP (Swin-L, step-3)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17