TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TediGAN: Text-Guided Diverse Face Image Generation and Man...

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu

2020-12-06CVPR 2021 1Text-to-Image GenerationImage GenerationFace Sketch Synthesis
PaperPDFCodeCodeCode(official)CodeCode

Abstract

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instance-level optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024. Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.

Results

TaskDatasetMetricValueModel
Image GenerationMulti-Modal-CelebA-HQAcc18.4TediGAN-A
Image GenerationMulti-Modal-CelebA-HQFID106.37TediGAN-A
Image GenerationMulti-Modal-CelebA-HQLPIPS0.456TediGAN-A
Image GenerationMulti-Modal-CelebA-HQReal22.6TediGAN-A
Text-to-Image GenerationMulti-Modal-CelebA-HQAcc18.4TediGAN-A
Text-to-Image GenerationMulti-Modal-CelebA-HQFID106.37TediGAN-A
Text-to-Image GenerationMulti-Modal-CelebA-HQLPIPS0.456TediGAN-A
Text-to-Image GenerationMulti-Modal-CelebA-HQReal22.6TediGAN-A
10-shot image generationMulti-Modal-CelebA-HQAcc18.4TediGAN-A
10-shot image generationMulti-Modal-CelebA-HQFID106.37TediGAN-A
10-shot image generationMulti-Modal-CelebA-HQLPIPS0.456TediGAN-A
10-shot image generationMulti-Modal-CelebA-HQReal22.6TediGAN-A
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQAcc18.4TediGAN-A
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQFID106.37TediGAN-A
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQLPIPS0.456TediGAN-A
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQReal22.6TediGAN-A

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15