TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Training Diffusion Models with Reinforcement Learning

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

2023-05-22DenoisingText-to-Image GenerationReinforcement LearningDecision MakingLanguage Modellingreinforcement-learning
PaperPDFCode(official)CodeCode

Abstract

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms, which we refer to as denoising diffusion policy optimization (DDPO), that are more effective than alternative reward-weighted likelihood approaches. Empirically, DDPO is able to adapt text-to-image diffusion models to objectives that are difficult to express via prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Finally, we show that DDPO can improve prompt-image alignment using feedback from a vision-language model without the need for additional data collection or human annotation. The project's website can be found at http://rl-diffusion.github.io .

Results

TaskDatasetMetricValueModel
Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.0121LCM (DDPO)
Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2803LCM (DDPO)
Image GenerationDrawBenchText Alignement (SentenceBERT)0.5721LCM (DDPO)
Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6748Stable Diffusion 1.5 (DDPO)
Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2673Stable Diffusion 1.5 (DDPO)
Image GenerationDrawBenchText Alignement (SentenceBERT)0.6024Stable Diffusion 1.5 (DDPO)
Text-to-Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.0121LCM (DDPO)
Text-to-Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2803LCM (DDPO)
Text-to-Image GenerationDrawBenchText Alignement (SentenceBERT)0.5721LCM (DDPO)
Text-to-Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6748Stable Diffusion 1.5 (DDPO)
Text-to-Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2673Stable Diffusion 1.5 (DDPO)
Text-to-Image GenerationDrawBenchText Alignement (SentenceBERT)0.6024Stable Diffusion 1.5 (DDPO)
10-shot image generationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.0121LCM (DDPO)
10-shot image generationDrawBenchHuman Preference Alignement (HPSv2)0.2803LCM (DDPO)
10-shot image generationDrawBenchText Alignement (SentenceBERT)0.5721LCM (DDPO)
10-shot image generationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6748Stable Diffusion 1.5 (DDPO)
10-shot image generationDrawBenchHuman Preference Alignement (HPSv2)0.2673Stable Diffusion 1.5 (DDPO)
10-shot image generationDrawBenchText Alignement (SentenceBERT)0.6024Stable Diffusion 1.5 (DDPO)
1 Image, 2*2 StitchiDrawBenchAesthetics (Laion Aesthtetics Predictor)6.0121LCM (DDPO)
1 Image, 2*2 StitchiDrawBenchHuman Preference Alignement (HPSv2)0.2803LCM (DDPO)
1 Image, 2*2 StitchiDrawBenchText Alignement (SentenceBERT)0.5721LCM (DDPO)
1 Image, 2*2 StitchiDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6748Stable Diffusion 1.5 (DDPO)
1 Image, 2*2 StitchiDrawBenchHuman Preference Alignement (HPSv2)0.2673Stable Diffusion 1.5 (DDPO)
1 Image, 2*2 StitchiDrawBenchText Alignement (SentenceBERT)0.6024Stable Diffusion 1.5 (DDPO)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17