Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Ruifeng Deng, Xin Li, Errui Ding, Hao Wang

2021-08-09ICCV 2021 10Style Transfer Reinforcement Learning Prediction Object Detection

Abstract

Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks. While reinforcement learning (RL) based agents can generate a stroke sequence step by step for this task, it is not easy to train a stable RL agent. On the other hand, stroke optimization methods search for a set of stroke parameters iteratively in a large search space; such low efficiency significantly limits their prevalence and practicality. Different from previous methods, in this paper, we formulate the task as a set prediction problem and propose a novel Transformer-based framework, dubbed Paint Transformer, to predict the parameters of a stroke set with a feed forward network. This way, our model can generate a set of strokes in parallel and obtain the final painting of size 512 * 512 in near real time. More importantly, since there is no dataset available for training the Paint Transformer, we devise a self-training pipeline such that it can be trained without any off-the-shelf dataset while still achieving excellent generalization capability. Experiments demonstrate that our method achieves better painting performance than previous ones with cheaper training and inference costs. Codes and models are available.

Results

Task	Dataset	Metric	Value	Model
Object Detection	SIXray	1 in 10 R@5	0.073	Optim [39] Lpixel
Object Detection	A2D	Mean IoU	5.8	RL [10] Lpixel
Object Detection	COCO 2017	Mean mAP	4.2	Lpixel
3D	SIXray	1 in 10 R@5	0.073	Optim [39] Lpixel
3D	A2D	Mean IoU	5.8	RL [10] Lpixel
3D	COCO 2017	Mean mAP	4.2	Lpixel
2D Classification	SIXray	1 in 10 R@5	0.073	Optim [39] Lpixel
2D Classification	A2D	Mean IoU	5.8	RL [10] Lpixel
2D Classification	COCO 2017	Mean mAP	4.2	Lpixel
2D Object Detection	SIXray	1 in 10 R@5	0.073	Optim [39] Lpixel
2D Object Detection	A2D	Mean IoU	5.8	RL [10] Lpixel
2D Object Detection	COCO 2017	Mean mAP	4.2	Lpixel
16k	SIXray	1 in 10 R@5	0.073	Optim [39] Lpixel
16k	A2D	Mean IoU	5.8	RL [10] Lpixel
16k	COCO 2017	Mean mAP	4.2	Lpixel

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Abstract

Results

Related Papers

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Abstract

Results

Related Papers