TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Open-World Text-Guided Face Image Generation and M...

Towards Open-World Text-Guided Face Image Generation and Manipulation

Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu

2021-04-18Text-to-Image GenerationSemantic SegmentationImage GenerationLanguage Modelling
PaperPDFCode(official)Code

Abstract

The existing text-guided image synthesis methods can only produce limited quality results with at most \mbox{$\text{256}^2$} resolution and the textual instructions are constrained in a small Corpus. In this work, we propose a unified framework for both face image generation and manipulation that produces diverse and high-quality images with an unprecedented resolution at 1024 from multimodal inputs. More importantly, our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing. To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model. Our proposed paradigm includes two novel strategies. The first strategy is to train a text encoder to obtain latent codes that align with the hierarchically semantic of the aforementioned pretrained GAN model. The second strategy is to directly optimize the latent codes in the latent space of the pretrained GAN model with guidance from a pretrained language model. The latent codes can be randomly sampled from a prior distribution or inverted from a given image, which provides inherent supports for both image generation and manipulation from multi-modal inputs, such as sketches or semantic labels, with textual guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.

Results

TaskDatasetMetricValueModel
Image GenerationMulti-Modal-CelebA-HQAcc20.4TediGAN-B
Image GenerationMulti-Modal-CelebA-HQFID101.42TediGAN-B
Image GenerationMulti-Modal-CelebA-HQLPIPS0.461TediGAN-B
Image GenerationMulti-Modal-CelebA-HQReal21TediGAN-B
Text-to-Image GenerationMulti-Modal-CelebA-HQAcc20.4TediGAN-B
Text-to-Image GenerationMulti-Modal-CelebA-HQFID101.42TediGAN-B
Text-to-Image GenerationMulti-Modal-CelebA-HQLPIPS0.461TediGAN-B
Text-to-Image GenerationMulti-Modal-CelebA-HQReal21TediGAN-B
10-shot image generationMulti-Modal-CelebA-HQAcc20.4TediGAN-B
10-shot image generationMulti-Modal-CelebA-HQFID101.42TediGAN-B
10-shot image generationMulti-Modal-CelebA-HQLPIPS0.461TediGAN-B
10-shot image generationMulti-Modal-CelebA-HQReal21TediGAN-B
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQAcc20.4TediGAN-B
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQFID101.42TediGAN-B
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQLPIPS0.461TediGAN-B
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQReal21TediGAN-B

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17