Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu
3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km^2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Scene Generation | KITTI | FID | 29.5 | GaussianCity |
| Scene Generation | KITTI | KID | 0.017 | GaussianCity |
| Scene Generation | GoogleEarth | Camera Error | 0.057 | GaussianCity |
| Scene Generation | GoogleEarth | Depth Error | 0.136 | GaussianCity |
| Scene Generation | GoogleEarth | FID | 86.94 | GaussianCity |
| Scene Generation | GoogleEarth | KID | 0.09 | GaussianCity |
| 16k | KITTI | FID | 29.5 | GaussianCity |
| 16k | KITTI | KID | 0.017 | GaussianCity |
| 16k | GoogleEarth | Camera Error | 0.057 | GaussianCity |
| 16k | GoogleEarth | Depth Error | 0.136 | GaussianCity |
| 16k | GoogleEarth | FID | 86.94 | GaussianCity |
| 16k | GoogleEarth | KID | 0.09 | GaussianCity |