TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Can We Generate Images with CoT? Let's Verify and Reinforc...

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng

2025-01-23Text-to-Image GenerationImage Generation
PaperPDFCode(official)Code

Abstract

Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks. However, it still remains an open question whether such strategies can be applied to verifying and reinforcing image generation scenarios. In this paper, we provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation. We focus on three techniques: scaling test-time computation for verification, aligning model preferences with Direct Preference Optimization (DPO), and integrating these techniques for complementary effects. Our results demonstrate that these approaches can be effectively adapted and combined to significantly improve image generation performance. Furthermore, given the pivotal role of reward models in our findings, we propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation. PARM adaptively assesses each generation step through a potential assessment approach, merging the strengths of existing reward models, and PARM++ further introduces a reflection mechanism to self-correct the generated unsatisfactory image. Using our investigated reasoning strategies, we enhance a baseline model, Show-o, to achieve superior results, with a significant +24% improvement on the GenEval benchmark, surpassing Stable Diffusion 3 by +15%. We hope our study provides unique insights and paves a new path for integrating CoT reasoning with autoregressive image generation. Code and models are released at https://github.com/ZiyuGuo99/Image-Generation-CoT

Results

TaskDatasetMetricValueModel
Image GenerationGenEvalOverall0.77Show-o [xie2024show] PARM It. DPO PARM
Image GenerationGenEvalOverall0.75Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
Text-to-Image GenerationGenEvalOverall0.77Show-o [xie2024show] PARM It. DPO PARM
Text-to-Image GenerationGenEvalOverall0.75Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
10-shot image generationGenEvalOverall0.77Show-o [xie2024show] PARM It. DPO PARM
10-shot image generationGenEvalOverall0.75Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
1 Image, 2*2 StitchiGenEvalOverall0.77Show-o [xie2024show] PARM It. DPO PARM
1 Image, 2*2 StitchiGenEvalOverall0.75Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15