Papers With Code 2 | ML Benchmarks, SotA Results & Code

GenAI-Bench benchmark consists of 1,600 challenging real-world text prompts sourced from professional designers. Compared to benchmarks such as PartiPrompt and T2I-CompBench, GenAI-Bench captures a wider range of aspects in the compositional text-to-visual generation, ranging from basic (scene, attribute, relation) to advanced (counting, comparison, differentiation, logic). GenAI-Bench benchmark also collects human alignment ratings (1-to-5 Likert scales) on images and videos generated by ten leading models, such as Stable Diffusion, DALL-E 3, Midjourney v6, Pika v1, and Gen2.

GenAI-Bench:

Prompt: 1600 prompts sourced from professional designers.
Compositional Skill Tags: Multiple compositional tags for each prompt. The compositional skill tags are categorized into Basic Skill and Advanced Skill. For detailed definitions and examples, please refer to our paper.
Images: Generated images are collected from DALLE_3, DeepFloyd_I_XL_v1, Midjourney_6, SDXL_2_1, SDXL_Base and SDXL_Turbo.
Human Ratings: 1-to-5 Likert scale ratings for each image.

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation