TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InstanceDiffusion: Instance-level Control for Image Genera...

InstanceDiffusion: Instance-level Control for Image Generation

Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra

2024-02-05CVPR 2024 1Conditional Text-to-Image SynthesisSemantic SegmentationInstance SegmentationImage Generation
PaperPDFCode(official)

Abstract

Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image. We introduce InstanceDiffusion that adds precise instance-level control to text-to-image diffusion models. InstanceDiffusion supports free-form language conditions per instance and allows flexible ways to specify instance locations such as simple single points, scribbles, bounding boxes or intricate instance segmentation masks, and combinations thereof. We propose three major changes to text-to-image models that enable precise instance-level control. Our UniFusion block enables instance-level conditions for text-to-image models, the ScaleU block improves image fidelity, and our Multi-instance Sampler improves generations for multiple instances. InstanceDiffusion significantly surpasses specialized state-of-the-art models for each location condition. Notably, on the COCO dataset, we outperform previous state-of-the-art by 20.4% AP$_{50}^\text{box}$ for box inputs, and 25.4% IoU for mask inputs.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO-MIGinstance success rate0.51Instance Diffusion (zero-shot)
Image GenerationCOCO-MIGmIoU0.46Instance Diffusion (zero-shot)
Text-to-Image GenerationCOCO-MIGinstance success rate0.51Instance Diffusion (zero-shot)
Text-to-Image GenerationCOCO-MIGmIoU0.46Instance Diffusion (zero-shot)
10-shot image generationCOCO-MIGinstance success rate0.51Instance Diffusion (zero-shot)
10-shot image generationCOCO-MIGmIoU0.46Instance Diffusion (zero-shot)
1 Image, 2*2 StitchiCOCO-MIGinstance success rate0.51Instance Diffusion (zero-shot)
1 Image, 2*2 StitchiCOCO-MIGmIoU0.46Instance Diffusion (zero-shot)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17