TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Palette: Image-to-Image Diffusion Models

Palette: Image-to-Image Diffusion Models

Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

2021-11-10JPEG DecompressionDenoisingUncroppingPerceptual DistanceTranslationColorizationImage-to-Image Translation
PaperPDFCodeCodeCodeCodeCode

Abstract

This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss or sophisticated new techniques needed. We uncover the impact of an L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention in the neural architecture through empirical studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts. Check out https://diffusion-palette.github.io for an overview of the results.

Results

TaskDatasetMetricValueModel
Image GenerationPlaces2 valFID11.7Palatte (20-30% free form)
Image GenerationPlaces2 valPD35Palatte (20-30% free form)
Image GenerationPlaces2 valFID11.9Palette (128×128 center mask)
Image GenerationPlaces2 valPD57.3Palette (128×128 center mask)
Image InpaintingPlaces2 valFID11.7Palatte (20-30% free form)
Image InpaintingPlaces2 valPD35Palatte (20-30% free form)
Image InpaintingPlaces2 valFID11.9Palette (128×128 center mask)
Image InpaintingPlaces2 valPD57.3Palette (128×128 center mask)
ColorizationImageNet valFID-5K15.78Palette
ColorizationImageNet ctest10kFID3.4Palette
UncroppingPlaces2 valFID3.53Palette
UncroppingPlaces2 valFool rate39.9Palette
UncroppingPlaces2 valPD103.3Palette
JPEG DecompressionImageNetCA73.5Palette (QF: 20)
JPEG DecompressionImageNetFID-5K4.3Palette (QF: 20)
JPEG DecompressionImageNetIS208.7Palette (QF: 20)
JPEG DecompressionImageNetPD37.1Palette (QF: 20)
JPEG DecompressionImageNetCA70.7Palette (QF: 10)
JPEG DecompressionImageNetFID-5K5.4Palette (QF: 10)
JPEG DecompressionImageNetIS180.5Palette (QF: 10)
JPEG DecompressionImageNetPD58.3Palette (QF: 10)
JPEG DecompressionImageNetCA64.2Palette (QF: 5)
JPEG DecompressionImageNetFID-5K8.3Palette (QF: 5)
JPEG DecompressionImageNetIS133.6Palette (QF: 5)
JPEG DecompressionImageNetPD95.5Palette (QF: 5)
JPEG DecompressionImageNetCA69.7Regression (QF: 20)
JPEG DecompressionImageNetFID-5K11.5Regression (QF: 20)
JPEG DecompressionImageNetIS158.7Regression (QF: 20)
JPEG DecompressionImageNetPD65.4Regression (QF: 20)
JPEG DecompressionImageNetCA63.5Regression (QF: 10)
JPEG DecompressionImageNetFID-5K18Regression (QF: 10)
JPEG DecompressionImageNetIS117.2Regression (QF: 10)
JPEG DecompressionImageNetPD102.2Regression (QF: 10)
JPEG DecompressionImageNetCA52.8Regression (QF: 5)
JPEG DecompressionImageNetFID-5K29Regression (QF: 5)
JPEG DecompressionImageNetIS73.9Regression (QF: 5)
JPEG DecompressionImageNetPD155.4Regression (QF: 5)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15A statistical physics framework for optimal learning2025-07-10