PhyBench
PhyBench is a comprehensive Text-to-Image (T2I) evaluation dataset designed to assess the physical commonsense of T2I models¹. It was introduced by the OpenGVLab and includes 700 prompts across four primary categories: mechanics, optics, thermodynamics, and material properties, covering 31 distinct physical scenarios¹.
The purpose of PhyBench is to evaluate how well T2I models, such as DALL-E 3 and Gemini, can generate images that are consistent with physical principles. The findings from the PhyBench assessments indicate that while these models can often translate text prompts into images, they frequently make errors in depicting physical scenarios correctly, particularly outside of optics¹.
For example, when given prompts like "A cylindrical block of wood placed in front of a mirror" or "An apple, a piece of wood, and an iron block in a tank filled with water", even advanced models like DALL-E 3 and Midjourney have shown to misrepresent the objects or omit them entirely¹.
(1) GitHub - OpenGVLab/PhyBench. https://github.com/OpenGVLab/PhyBench. (2) Evaluate your computer's hardware capabilities | Cinebench from Maxon. https://www.maxon.net/en/cinebench. (3) KegangWangCCNU/PhysBench - GitHub. https://github.com/KegangWangCCNU/PhysBench.