Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to higher layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs-based architectures. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks. Code is released at https://github.com/fenglinglwb/EDT.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | Set5 - 3x upscaling | PSNR | 35.13 | EDT-B |
| Super-Resolution | Set5 - 3x upscaling | SSIM | 0.9328 | EDT-B |
| Super-Resolution | Set5 - 2x upscaling | PSNR | 38.63 | EDT-B |
| Super-Resolution | Set5 - 2x upscaling | SSIM | 0.9632 | EDT-B |
| Image Super-Resolution | Set5 - 3x upscaling | PSNR | 35.13 | EDT-B |
| Image Super-Resolution | Set5 - 3x upscaling | SSIM | 0.9328 | EDT-B |
| Image Super-Resolution | Set5 - 2x upscaling | PSNR | 38.63 | EDT-B |
| Image Super-Resolution | Set5 - 2x upscaling | SSIM | 0.9632 | EDT-B |
| 3D Object Super-Resolution | Set5 - 3x upscaling | PSNR | 35.13 | EDT-B |
| 3D Object Super-Resolution | Set5 - 3x upscaling | SSIM | 0.9328 | EDT-B |
| 3D Object Super-Resolution | Set5 - 2x upscaling | PSNR | 38.63 | EDT-B |
| 3D Object Super-Resolution | Set5 - 2x upscaling | SSIM | 0.9632 | EDT-B |
| 16k | Set5 - 3x upscaling | PSNR | 35.13 | EDT-B |
| 16k | Set5 - 3x upscaling | SSIM | 0.9328 | EDT-B |
| 16k | Set5 - 2x upscaling | PSNR | 38.63 | EDT-B |
| 16k | Set5 - 2x upscaling | SSIM | 0.9632 | EDT-B |