TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Paint Transformer: Feed Forward Neural Painting with Strok...

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Ruifeng Deng, Xin Li, Errui Ding, Hao Wang

2021-08-09ICCV 2021 10Style TransferReinforcement LearningPredictionObject Detection
PaperPDFCode(official)Code(official)

Abstract

Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks. While reinforcement learning (RL) based agents can generate a stroke sequence step by step for this task, it is not easy to train a stable RL agent. On the other hand, stroke optimization methods search for a set of stroke parameters iteratively in a large search space; such low efficiency significantly limits their prevalence and practicality. Different from previous methods, in this paper, we formulate the task as a set prediction problem and propose a novel Transformer-based framework, dubbed Paint Transformer, to predict the parameters of a stroke set with a feed forward network. This way, our model can generate a set of strokes in parallel and obtain the final painting of size 512 * 512 in near real time. More importantly, since there is no dataset available for training the Paint Transformer, we devise a self-training pipeline such that it can be trained without any off-the-shelf dataset while still achieving excellent generalization capability. Experiments demonstrate that our method achieves better painting performance than previous ones with cheaper training and inference costs. Codes and models are available.

Results

TaskDatasetMetricValueModel
Object DetectionSIXray1 in 10 R@50.073Optim [39] Lpixel
Object DetectionA2DMean IoU5.8RL [10] Lpixel
Object DetectionCOCO 2017Mean mAP4.2Lpixel
3DSIXray1 in 10 R@50.073Optim [39] Lpixel
3DA2DMean IoU5.8RL [10] Lpixel
3DCOCO 2017Mean mAP4.2Lpixel
2D ClassificationSIXray1 in 10 R@50.073Optim [39] Lpixel
2D ClassificationA2DMean IoU5.8RL [10] Lpixel
2D ClassificationCOCO 2017Mean mAP4.2Lpixel
2D Object DetectionSIXray1 in 10 R@50.073Optim [39] Lpixel
2D Object DetectionA2DMean IoU5.8RL [10] Lpixel
2D Object DetectionCOCO 2017Mean mAP4.2Lpixel
16kSIXray1 in 10 R@50.073Optim [39] Lpixel
16kA2DMean IoU5.8RL [10] Lpixel
16kCOCO 2017Mean mAP4.2Lpixel

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17