Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel

2022-12-01CVPR 2023 1Text-to-Image Generation multimodal generation Semantic Segmentation Image Generation Face Generation Text-to-Face Generation Face Sketch Synthesis

Paper PDF Code(official)

Abstract

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
Sketch	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
Image Generation	Multi-Modal-CelebA-HQ	FID	26.09	Unite and Conquer
Image Generation	Multi-Modal-CelebA-HQ	LPIPS	0.519	Unite and Conquer
Face Reconstruction	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
3D	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
3D Face Modelling	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
3D Face Reconstruction	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
Text-to-Image Generation	Multi-Modal-CelebA-HQ	FID	26.09	Unite and Conquer
Text-to-Image Generation	Multi-Modal-CelebA-HQ	LPIPS	0.519	Unite and Conquer
Multimodal Association	Multi-Modal CelebA-HQ	FID	26.09	Diffusion
10-shot image generation	Multi-Modal-CelebA-HQ	FID	26.09	Unite and Conquer
10-shot image generation	Multi-Modal-CelebA-HQ	LPIPS	0.519	Unite and Conquer
1 Image, 2*2 Stitchi	Multi-Modal-CelebA-HQ	FID	26.09	Unite and Conquer
1 Image, 2*2 Stitchi	Multi-Modal-CelebA-HQ	LPIPS	0.519	Unite and Conquer

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Abstract

Results

Related Papers

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Abstract

Results

Related Papers