TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

2022-09-29Text-to-Image GenerationImage-text RetrievalText RetrievalText to Image GenerationRetrievalImage Generation
PaperPDF

Abstract

Research on text-to-image generation has witnessed significant progress in generating diverse and photo-realistic images, driven by diffusion and auto-regressive models trained on large-scale image-text data. Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)' or `Picarones (food)'. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. Given a text prompt, Re-Imagen accesses an external multi-modal knowledge base to retrieve relevant (image, text) pairs and uses them as references to generate the image. With this retrieval step, Re-Imagen is augmented with the knowledge of high-level semantics and low-level visual details of the mentioned entities, and thus improves its accuracy in generating the entities' visual appearances. We train Re-Imagen on a constructed dataset containing (image, text, retrieval) triples to teach the model to ground on both text prompt and retrieval. Furthermore, we develop a new sampling strategy to interleave the classifier-free guidance for text and retrieval conditions to balance the text and retrieval alignment. Re-Imagen achieves significant gain on FID score over COCO and WikiImage. To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters. Human evaluation on EntityDrawBench shows that Re-Imagen can significantly improve the fidelity of generated images, especially on less frequent entities.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID5.25Re-Imagen (Finetuned)
Image GenerationCOCO (Common Objects in Context)FID6.88Re-Imagen
Text-to-Image GenerationCOCO (Common Objects in Context)FID5.25Re-Imagen (Finetuned)
Text-to-Image GenerationCOCO (Common Objects in Context)FID6.88Re-Imagen
10-shot image generationCOCO (Common Objects in Context)FID5.25Re-Imagen (Finetuned)
10-shot image generationCOCO (Common Objects in Context)FID6.88Re-Imagen
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID5.25Re-Imagen (Finetuned)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID6.88Re-Imagen

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17