TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Diffusion-Based Image Synthesis with Context Pre...

Improving Diffusion-Based Image Synthesis with Context Prediction

Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui

2024-01-04NeurIPS 2023 11DenoisingText-to-Image GenerationText to Image GenerationImage InpaintingUnconditional Image GenerationPredictionImage Generation
PaperPDF

Abstract

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose ConPreDiff to improve diffusion-based image synthesis with context prediction. We explicitly reinforce each point to predict its neighborhood context (i.e., multi-stride features/tokens/pixels) with a context decoder at the end of diffusion denoising blocks in training stage, and remove the decoder for inference. In this way, each point can better reconstruct itself by preserving its semantic connections with neighborhood context. This new paradigm of ConPreDiff can generalize to arbitrary discrete and continuous diffusion backbones without introducing extra parameters in sampling procedure. Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID6.21ConPreDiff
Image GenerationCOCO (Common Objects in Context)Zero shot FID6.21ConPreDiff
Image GenerationCelebALPIPS0.022ConPreDiff
Image InpaintingCelebALPIPS0.022ConPreDiff
Text-to-Image GenerationCOCO (Common Objects in Context)FID6.21ConPreDiff
Text-to-Image GenerationCOCO (Common Objects in Context)Zero shot FID6.21ConPreDiff
10-shot image generationCOCO (Common Objects in Context)FID6.21ConPreDiff
10-shot image generationCOCO (Common Objects in Context)Zero shot FID6.21ConPreDiff
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID6.21ConPreDiff
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Zero shot FID6.21ConPreDiff

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16