TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Object-Centric Image Generation from Layouts

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio, R. Devon Hjelm, Shikhar Sharma

2020-03-16Layout-to-Image GenerationImage Generation
PaperPDF

Abstract

Despite recent impressive results on single-object and single-domain image generation, the generation of complex scenes with multiple objects remains challenging. In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fr{\'e}chet Inception Distance metric, that is better suited for multi-object images.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO-Stuff 128x128FID36.31OC-GAN
Image GenerationCOCO-Stuff 128x128Inception Score14.6OC-GAN
Image GenerationCOCO-Stuff 128x128SceneFID16.76OC-GAN
Image GenerationCOCO-Stuff 64x64FID29.57OC-GAN
Image GenerationCOCO-Stuff 64x64Inception Score10.8OC-GAN
Image GenerationVisual Genome 64x64FID20.27OC-GAN
Image GenerationVisual Genome 64x64Inception Score9.3OC-GAN
Image GenerationCOCO-Stuff 256x256FID41.65OC-GAN
Image GenerationCOCO-Stuff 256x256Inception Score17.8OC-GAN
Image GenerationVisual Genome 128x128FID28.26OC-GAN
Image GenerationVisual Genome 128x128Inception Score12.3OC-GAN
Image GenerationVisual Genome 128x128SceneFID9.63OC-GAN
Image GenerationVisual Genome 256x256FID40.85OC-GAN
Image GenerationVisual Genome 256x256Inception Score14.7OC-GAN

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15