ChatPainter: Improving Text to Image Generation using Dialogue
Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, Yoshua Bengio
Abstract
Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the images correspond to which words in the captions. We show that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | COCO (Common Objects in Context) | Inception score | 9.74 | ChatPainter |
| Text-to-Image Generation | COCO (Common Objects in Context) | Inception score | 9.74 | ChatPainter |
| 10-shot image generation | COCO (Common Objects in Context) | Inception score | 9.74 | ChatPainter |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | Inception score | 9.74 | ChatPainter |
Related Papers
fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15