Data Extrapolation for Text-to-image Generation on Small Datasets

Senmao Ye, Fei Liu

2024-10-02Text-to-Image Generation Data Augmentation Text to Image Generation Image Generation

Abstract

Text-to-image generation requires large amount of training data to synthesizing high-quality images. For augmenting training data, previous methods rely on data interpolations like cropping, flipping, and mixing up, which fail to introduce new information and yield only marginal improvements. In this paper, we propose a new data augmentation method for text-to-image generation using linear extrapolation. Specifically, we apply linear extrapolation only on text feature, and new image data are retrieved from the internet by search engines. For the reliability of new text-image pairs, we design two outlier detectors to purify retrieved images. Based on extrapolation, we construct training samples dozens of times larger than the original dataset, resulting in a significant improvement in text-to-image performance. Moreover, we propose a NULL-guidance to refine score estimation, and apply recurrent affine transformation to fuse text information. Our model achieves FID scores of 7.91, 9.52 and 5.00 on the CUB, Oxford and COCO datasets. The code and data will be available on GitHub (https://github.com/senmaoy/RAT-Diffusion).

Results

Task	Dataset	Metric	Value	Model
Image Generation	COCO (Common Objects in Context)	FID	5	RAT-Diffusion
Image Generation	Oxford 102 Flowers	FID	9.52	RAT-Diffusion
Image Generation	Oxford 102 Flowers	Inception score	4.35	RAT-Diffusion
Image Generation	CUB	FID	6.36	RAT-Diffusion
Image Generation	CUB	Inception score	6.56	RAT-Diffusion
Text-to-Image Generation	COCO (Common Objects in Context)	FID	5	RAT-Diffusion
Text-to-Image Generation	Oxford 102 Flowers	FID	9.52	RAT-Diffusion
Text-to-Image Generation	Oxford 102 Flowers	Inception score	4.35	RAT-Diffusion
Text-to-Image Generation	CUB	FID	6.36	RAT-Diffusion
Text-to-Image Generation	CUB	Inception score	6.56	RAT-Diffusion
10-shot image generation	COCO (Common Objects in Context)	FID	5	RAT-Diffusion
10-shot image generation	Oxford 102 Flowers	FID	9.52	RAT-Diffusion
10-shot image generation	Oxford 102 Flowers	Inception score	4.35	RAT-Diffusion
10-shot image generation	CUB	FID	6.36	RAT-Diffusion
10-shot image generation	CUB	Inception score	6.56	RAT-Diffusion
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID	5	RAT-Diffusion
1 Image, 2*2 Stitchi	Oxford 102 Flowers	FID	9.52	RAT-Diffusion
1 Image, 2*2 Stitchi	Oxford 102 Flowers	Inception score	4.35	RAT-Diffusion
1 Image, 2*2 Stitchi	CUB	FID	6.36	RAT-Diffusion
1 Image, 2*2 Stitchi	CUB	Inception score	6.56	RAT-Diffusion

Data Extrapolation for Text-to-image Generation on Small Datasets

Abstract

Results

Related Papers

Data Extrapolation for Text-to-image Generation on Small Datasets

Abstract

Results

Related Papers