TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unite and Conquer: Plug & Play Multi-Modal Synthesis using...

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel

2022-12-01CVPR 2023 1Text-to-Image Generationmultimodal generationSemantic SegmentationImage GenerationFace GenerationText-to-Face GenerationFace Sketch Synthesis
PaperPDFCode(official)

Abstract

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingMulti-Modal CelebA-HQFID26.09Diffusion
SketchMulti-Modal CelebA-HQFID26.09Diffusion
Image GenerationMulti-Modal-CelebA-HQFID26.09Unite and Conquer
Image GenerationMulti-Modal-CelebA-HQLPIPS0.519Unite and Conquer
Face ReconstructionMulti-Modal CelebA-HQFID26.09Diffusion
3DMulti-Modal CelebA-HQFID26.09Diffusion
3D Face ModellingMulti-Modal CelebA-HQFID26.09Diffusion
3D Face ReconstructionMulti-Modal CelebA-HQFID26.09Diffusion
Text-to-Image GenerationMulti-Modal-CelebA-HQFID26.09Unite and Conquer
Text-to-Image GenerationMulti-Modal-CelebA-HQLPIPS0.519Unite and Conquer
Multimodal AssociationMulti-Modal CelebA-HQFID26.09Diffusion
10-shot image generationMulti-Modal-CelebA-HQFID26.09Unite and Conquer
10-shot image generationMulti-Modal-CelebA-HQLPIPS0.519Unite and Conquer
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQFID26.09Unite and Conquer
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQLPIPS0.519Unite and Conquer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17