TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GAUDI: A Neural Architect for Immersive 3D Scene Generation

GAUDI: A Neural Architect for Immersive 3D Scene Generation

Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

2022-07-27Scene GenerationImage Generation
PaperPDFCode(official)

Abstract

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.

Results

TaskDatasetMetricValueModel
Image GenerationARKitScenesFID37.35GAUDI
Image GenerationARKitScenesFID (SwAV)4.14GAUDI
Image GenerationARKitScenesFID79.54GSN
Image GenerationARKitScenesFID (SwAV)10.21GSN
Image GenerationARKitScenesFID87.06GRAF
Image GenerationARKitScenesFID (SwAV)13.44GRAF
Image GenerationARKitScenesFID134.8π-GAN
Image GenerationARKitScenesFID (SwAV)15.58π-GAN
Image GenerationVLN-CEFID18.52GAUDI
Image GenerationVLN-CEFID (SwAV)3.63GAUDI
Image GenerationVLN-CEFID43.32GSN
Image GenerationVLN-CEFID (SwAV)6.19GSN
Image GenerationVLN-CEFID90.43GRAF
Image GenerationVLN-CEFID (SwAV)8.65GRAF
Image GenerationVLN-CEFID151.26π-GAN
Image GenerationVLN-CEFID (SwAV)14.07π-GAN
Image GenerationVizDoomFID33.7GAUDI
Image GenerationVizDoomFID (SwAV)3.24GAUDI
Image GenerationVizDoomFID143.55π-GAN
Image GenerationVizDoomFID (SwAV)15.26π-GAN
Image GenerationReplicaFID18.75GAUDI
Image GenerationReplicaFID (SwAV)1.76GAUDI
Image GenerationReplicaFID166.55π-GAN
Image GenerationReplicaFID (SwAV)13.17π-GAN

Related Papers

World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15