TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InstructPix2Pix: Learning to Follow Image Editing Instruct...

InstructPix2Pix: Learning to Follow Image Editing Instructions

Tim Brooks, Aleksander Holynski, Alexei A. Efros

2022-11-17CVPR 2023 1Text to Image GenerationImage EditingText-based Image EditingLanguage Modelling
PaperPDFCodeCodeCodeCode(official)CodeCode

Abstract

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.

Results

TaskDatasetMetricValueModel
Image EditingImgEdit-DataAction1.46Instruct-Pix2Pix
Image EditingImgEdit-DataAdd2.45Instruct-Pix2Pix
Image EditingImgEdit-DataAdjust1.83Instruct-Pix2Pix
Image EditingImgEdit-DataBackground1.44Instruct-Pix2Pix
Image EditingImgEdit-DataExtract1.44Instruct-Pix2Pix
Image EditingImgEdit-DataHybrid1.2Instruct-Pix2Pix
Image EditingImgEdit-DataOverall1.88Instruct-Pix2Pix
Image EditingImgEdit-DataRemove1.5Instruct-Pix2Pix
Image EditingImgEdit-DataReplace2.01Instruct-Pix2Pix
Image EditingImgEdit-DataStyle3.55Instruct-Pix2Pix

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16