LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Guangcong Zheng, Xianpan Zhou, XueWei Li, Zhongang Qi, Ying Shan, Xi Li

2023-03-30CVPR 2023 1Layout-to-Image Generation Image Generation

Abstract

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

Results

Task	Dataset	Metric	Value	Model
Image Generation	COCO-Stuff 256x256	FID	31.68	LayoutDiffusion (25steps)
Image Generation	Visual Genome 128x128	FID	16.35	LayoutDiffusion
Image Generation	Visual Genome 256x256	FID	15.63	LayoutDiffusion

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17 FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 FADE: Adversarial Concept Erasure in Flow Models2025-07-16 CharaConsist: Fine-Grained Consistent Character Generation2025-07-15 CATVis: Context-Aware Thought Visualization2025-07-15