TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning What and Where to Draw

Learning What and Where to Draw

Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee

2016-10-08NeurIPS 2016 12Text-to-Image GenerationImage Generation
PaperPDF

Abstract

Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers. While existing models can synthesize images based on global constraints such as a class label or caption, they do not provide control over pose or object location. We propose a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what content to draw in which location. We show high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset, conditioned on both informal text descriptions and also object location. Our system exposes control over both the bounding box around the bird and its constituent parts. By modeling the conditional distributions over part locations, our system also enables conditioning on arbitrary subsets of parts (e.g. only the beak and tail), yielding an efficient interface for picking part locations. We also show preliminary results on the more challenging domain of text- and location-controllable synthesis of images of human actions on the MPII Human Pose dataset.

Results

TaskDatasetMetricValueModel
Image GenerationCUBFID67.22GAWWN
Image GenerationCUBInception score3.62GAWWN
Text-to-Image GenerationCUBFID67.22GAWWN
Text-to-Image GenerationCUBInception score3.62GAWWN
10-shot image generationCUBFID67.22GAWWN
10-shot image generationCUBInception score3.62GAWWN
1 Image, 2*2 StitchiCUBFID67.22GAWWN
1 Image, 2*2 StitchiCUBInception score3.62GAWWN

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15