TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

Zhaoyan Liu, Noel Vouitsis, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem

2023-04-26Text-to-Image Generation Image Generation

Abstract

We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. For example, we show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which "translates" between the space of conditions and the latent space of the generative model, in such a way that the generated latent corresponds to a data sample satisfying the desired condition. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed -- all while retaining a much higher level of generality. Our code is available at https://github.com/layer6ai-labs/tr0n.

Results

Task	Dataset	Metric	Value	Model
Image Generation	COCO (Common Objects in Context)	FID	10.9	TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
Text-to-Image Generation	COCO (Common Objects in Context)	FID	10.9	TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
10-shot image generation	COCO (Common Objects in Context)	FID	10.9	TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID	10.9	TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

Abstract

Results

Related Papers

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

Abstract

Results

Related Papers