TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Diffusion Model Alignment Using Direct Preference Optimiza...

Diffusion Model Alignment Using Direct Preference Optimization

Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

2023-11-21CVPR 2024 1Text-to-Image Generation
PaperPDFCodeCode

Abstract

Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality images and captions to improve visual appeal and text alignment. We propose Diffusion-DPO, a method to align diffusion models to human preferences by directly optimizing on human comparison data. Diffusion-DPO is adapted from the recently developed Direct Preference Optimization (DPO), a simpler alternative to RLHF which directly optimizes a policy that best satisfies human preferences under a classification objective. We re-formulate DPO to account for a diffusion model notion of likelihood, utilizing the evidence lower bound to derive a differentiable objective. Using the Pick-a-Pic dataset of 851K crowdsourced pairwise preferences, we fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. Our fine-tuned base model significantly outperforms both base SDXL-1.0 and the larger SDXL-1.0 model consisting of an additional refinement model in human evaluation, improving visual appeal and prompt alignment. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences, opening the door for scaling of diffusion model alignment methods.

Results

TaskDatasetMetricValueModel
Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.043LCM (DPO)
Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2814LCM (DPO)
Image GenerationDrawBenchText Alignement (SentenceBERT)0.572LCM (DPO)
Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6205Stable Diffusion 1.5 (DPO)
Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2672Stable Diffusion 1.5 (DPO)
Image GenerationDrawBenchText Alignement (SentenceBERT)0.6075Stable Diffusion 1.5 (DPO)
Text-to-Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.043LCM (DPO)
Text-to-Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2814LCM (DPO)
Text-to-Image GenerationDrawBenchText Alignement (SentenceBERT)0.572LCM (DPO)
Text-to-Image GenerationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6205Stable Diffusion 1.5 (DPO)
Text-to-Image GenerationDrawBenchHuman Preference Alignement (HPSv2)0.2672Stable Diffusion 1.5 (DPO)
Text-to-Image GenerationDrawBenchText Alignement (SentenceBERT)0.6075Stable Diffusion 1.5 (DPO)
10-shot image generationDrawBenchAesthetics (Laion Aesthtetics Predictor)6.043LCM (DPO)
10-shot image generationDrawBenchHuman Preference Alignement (HPSv2)0.2814LCM (DPO)
10-shot image generationDrawBenchText Alignement (SentenceBERT)0.572LCM (DPO)
10-shot image generationDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6205Stable Diffusion 1.5 (DPO)
10-shot image generationDrawBenchHuman Preference Alignement (HPSv2)0.2672Stable Diffusion 1.5 (DPO)
10-shot image generationDrawBenchText Alignement (SentenceBERT)0.6075Stable Diffusion 1.5 (DPO)
1 Image, 2*2 StitchiDrawBenchAesthetics (Laion Aesthtetics Predictor)6.043LCM (DPO)
1 Image, 2*2 StitchiDrawBenchHuman Preference Alignement (HPSv2)0.2814LCM (DPO)
1 Image, 2*2 StitchiDrawBenchText Alignement (SentenceBERT)0.572LCM (DPO)
1 Image, 2*2 StitchiDrawBenchAesthetics (Laion Aesthtetics Predictor)5.6205Stable Diffusion 1.5 (DPO)
1 Image, 2*2 StitchiDrawBenchHuman Preference Alignement (HPSv2)0.2672Stable Diffusion 1.5 (DPO)
1 Image, 2*2 StitchiDrawBenchText Alignement (SentenceBERT)0.6075Stable Diffusion 1.5 (DPO)

Related Papers

CharaConsist: Fine-Grained Consistent Character Generation2025-07-15Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09NeoBabel: A Multilingual Open Tower for Visual Generation2025-07-08DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer2025-07-07UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis2025-07-01Ovis-U1 Technical Report2025-06-29Rethink Sparse Signals for Pose-guided Text-to-image Generation2025-06-26XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation2025-06-26