TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/DrawBench

DrawBench

Introduced 2022-05-23

DrawBench is a comprehensive and challenging benchmark for text-to-image models, introduced by the Imagen research team. Let me provide you with more details:

  1. Purpose and Context:

    • DrawBench serves as an evaluation benchmark specifically designed to assess the performance of text-to-image models.
    • It allows researchers and practitioners to compare different methods and understand their strengths and weaknesses in generating images from textual descriptions.
  2. Imagen: Text-to-Image Diffusion Models:

    • Imagen is a state-of-the-art text-to-image diffusion model developed by the Google Research Brain Team.
    • It combines the power of large transformer language models (such as T5) for understanding text with the strength of diffusion models for high-fidelity image generation.
    • Key Discovery: Imagen demonstrates that generic large language models pretrained on text-only corpora are remarkably effective at encoding text for image synthesis.
    • Photorealism and Language Understanding: Imagen achieves an unprecedented degree of photorealism and a deep level of language understanding.
    • FID Score: It achieves a new state-of-the-art FID (Fréchet Inception Distance) score of 7.27 on the COCO dataset, without ever being trained on COCO.
    • Human Raters' Perception: Human raters find Imagen samples to be on par with the COCO data itself in terms of image-text alignment.
  3. DrawBench: A Comprehensive Benchmark:

    • DrawBench provides a rigorous evaluation framework for text-to-image models.
    • Researchers can compare Imagen with other recent methods, including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2.
    • Human raters prefer Imagen over other models in side-by-side comparisons, considering both sample quality and image-text alignment.
  4. Examples from the Imagen Family:

    • Imagen generates diverse and imaginative images based on textual prompts. Here are some examples:
      • A strawberry mug filled with white sesame seeds, floating in a dark chocolate sea.
      • A brain riding a rocketship heading towards the moon.
      • A dragon fruit wearing a karate belt in the snow.
      • A small cactus wearing a straw hat and neon sunglasses in the Sahara desert.
      • A photo of a Corgi dog riding a bike in Times Square, wearing sunglasses and a beach hat.
      • Teddy bears swimming at the Olympics 400m Butterfly event.
      • Sprouts in the shape of the text 'Imagen' coming out of a fairytale book.
      • A transparent sculpture of a duck made out of glass, in front of a painting of a landscape.
      • A single beam of light entering the room from the ceiling, illuminating an easel with a Rembrandt painting of a raccoon.
  5. Technical Details:

    • Imagen uses a large frozen T5-XXL encoder to encode input text into embeddings.
    • The combination of language understanding and diffusion-based image generation results in high-quality, contextually relevant images.

Source: Conversation with Bing, 3/18/2024 (1) Imagen: Text-to-Image Diffusion Models. https://imagen.research.google/. (2) Evaluating Diffusion Models - Hugging Face. https://huggingface.co/docs/diffusers/conceptual/evaluation. (3) shunk031/DrawBench · Datasets at Hugging Face. https://huggingface.co/datasets/shunk031/DrawBench. (4) sayakpaul/drawbench · Datasets at Hugging Face. https://huggingface.co/datasets/sayakpaul/drawbench.

Benchmarks

1 Image, 2*2 Stitchi/Aesthetics (Laion Aesthtetics Predictor)1 Image, 2*2 Stitchi/Human Preference Alignement (HPSv2)1 Image, 2*2 Stitchi/Text Alignement (SentenceBERT)10-shot image generation/Aesthetics (Laion Aesthtetics Predictor)10-shot image generation/Human Preference Alignement (HPSv2)10-shot image generation/Text Alignement (SentenceBERT)Image Generation/Aesthetics (Laion Aesthtetics Predictor)Image Generation/Human Preference Alignement (HPSv2)Image Generation/Text Alignement (SentenceBERT)Text-to-Image Generation/Aesthetics (Laion Aesthtetics Predictor)Text-to-Image Generation/Human Preference Alignement (HPSv2)Text-to-Image Generation/Text Alignement (SentenceBERT)

Statistics

Papers
82
Benchmarks
12

Links

Homepage

Tasks

1 Image, 2*2 Stitchi10-shot image generationImage GenerationText-to-Image Generation