Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

2024-05-22CVPR 2025 1Text-to-Image Generation Text to Image Generation Image Generation

Abstract

Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://github.com/CroitoruAlin/Curriculum-DPO.

Results

Task	Dataset	Metric	Value	Model
Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
Text-to-Image Generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
Text-to-Image Generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
10-shot image generation	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
10-shot image generation	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	6.1829	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Human Preference Alignement (HPSv2)	0.2851	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Text Alignement (SentenceBERT)	0.5812	LCM (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Aesthetics (Laion Aesthtetics Predictor)	5.706	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Human Preference Alignement (HPSv2)	0.2681	Stable Diffusion 1.5 (Curriculum DPO)
1 Image, 2*2 Stitchi	DrawBench	Text Alignement (SentenceBERT)	0.6234	Stable Diffusion 1.5 (Curriculum DPO)

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Abstract

Results

Related Papers

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Abstract

Results

Related Papers