Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

2022-03-21Translation 3D-Aware Image Synthesis

Abstract

Image translation and manipulation have gain increasing attention along with the rapid development of deep generative models. Although existing approaches have brought impressive results, they mainly operated in 2D space. In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input. To kick-off this novel task, we propose the Sem2NeRF framework. In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pre-trained decoder. To further improve the accuracy of the mapping, we integrate a new region-aware learning strategy into the design of both the encoder and the decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that it outperforms several strong baselines on two benchmark datasets. Code and video are available at https://donydchen.github.io/sem2nerf/

Results

Task	Dataset	Metric	Value	Model
Image Generation	CelebAMask-HQ	FID	41.52	Sem2NeRF
Image Generation	CelebAMask-HQ	IS	2.03	Sem2NeRF
Image Generation	CelebAMask-HQ	FID	55.56	pSp
Image Generation	CelebAMask-HQ	IS	1.74	pSp
Image Generation	CelebAMask-HQ	FID	67.32	pix2pixHD
Image Generation	CelebAMask-HQ	IS	1.72	pix2pixHD
3D	CelebAMask-HQ	FID	41.52	Sem2NeRF
3D	CelebAMask-HQ	IS	2.03	Sem2NeRF
3D	CelebAMask-HQ	FID	55.56	pSp
3D	CelebAMask-HQ	IS	1.74	pSp
3D	CelebAMask-HQ	FID	67.32	pix2pixHD
3D	CelebAMask-HQ	IS	1.72	pix2pixHD

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04 TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01 CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29