Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at https://github.com/swz30/Restormer.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Deblurring | GoPro | PSNR | 32.92 | Restormer |
| Deblurring | GoPro | SSIM | 0.961 | Restormer |
| Deblurring | RealBlur-R (trained on GoPro) | PSNR (sRGB) | 36.19 | Restormer |
| Deblurring | RealBlur-R (trained on GoPro) | SSIM (sRGB) | 0.957 | Restormer |
| Deblurring | MSU BASED | ERQAv2.0 | 0.73875 | Restormer local |
| Deblurring | MSU BASED | LPIPS | 0.08251 | Restormer local |
| Deblurring | MSU BASED | PSNR | 31.12341 | Restormer local |
| Deblurring | MSU BASED | SSIM | 0.94217 | Restormer local |
| Deblurring | MSU BASED | Subjective | 0.1231 | Restormer local |
| Deblurring | MSU BASED | VMAF | 65.25911 | Restormer local |
| Deblurring | MSU BASED | ERQAv2.0 | 0.74776 | Restormer |
| Deblurring | MSU BASED | LPIPS | 0.08239 | Restormer |
| Deblurring | MSU BASED | PSNR | 31.76111 | Restormer |
| Deblurring | MSU BASED | SSIM | 0.94632 | Restormer |
| Deblurring | MSU BASED | Subjective | 0.1175 | Restormer |
| Deblurring | MSU BASED | VMAF | 66.3964 | Restormer |
| Deblurring | RealBlur-J (trained on GoPro) | PSNR (sRGB) | 28.96 | Restormer |
| Deblurring | RealBlur-J (trained on GoPro) | SSIM (sRGB) | 0.879 | Restormer |
| Deblurring | HIDE (trained on GOPRO) | PSNR (sRGB) | 31.22 | Restormer |
| Deblurring | HIDE (trained on GOPRO) | Params (M) | 26.13 | Restormer |
| Deblurring | HIDE (trained on GOPRO) | SSIM (sRGB) | 0.942 | Restormer |
| Deblurring | RSBlur | Average PSNR | 33.69 | Restormer |
| Rain Removal | Test1200 | PSNR | 33.19 | Restormer |
| Rain Removal | Test1200 | SSIM | 0.926 | Restormer |
| Rain Removal | Rain100H | PSNR | 31.46 | Restormer |
| Rain Removal | Rain100H | SSIM | 0.904 | Restormer |
| Rain Removal | Test2800 | PSNR | 34.18 | Restormer |
| Rain Removal | Test2800 | SSIM | 0.944 | Restormer |
| Rain Removal | Test100 | PSNR | 32 | Restormer |
| Rain Removal | Test100 | SSIM | 0.923 | Restormer |
| Rain Removal | Rain100L | PSNR | 38.99 | Restormer |
| Rain Removal | Rain100L | SSIM | 0.978 | Restormer |
| Image Restoration | CDD-11 | Average PSNR (dB) | 26.99 | Restormer |
| Image Restoration | CDD-11 | SSIM | 0.8646 | Restormer |
| Image Restoration | ARAD-1K | MRAE | 0.1833 | Restormer |
| Image Restoration | ARAD-1K | PSNR | 33.4 | Restormer |
| Image Restoration | ARAD-1K | RMSE | 0.0274 | Restormer |
| Image Restoration | CSD | Average PSNR (dB) | 35.43 | Restormer |
| Denoising | SIDD | PSNR (sRGB) | 40.02 | Restormer |
| Denoising | SIDD | SSIM (sRGB) | 0.96 | Restormer |
| Denoising | DND | PSNR (sRGB) | 40.03 | Restormer |
| Denoising | DND | SSIM (sRGB) | 0.956 | Restormer |
| Denoising | Kodak24 sigma50 | PSNR | 30.01 | Restormer |
| Denoising | urban100 sigma15 | Average PSNR | 35.13 | Restormer |
| Denoising | Urban100 sigma50 | PSNR | 30.02 | Restormer |
| Denoising | Urban100 sigma25 | PSNR | 31.46 | Restormer |
| Denoising | Urban100 sigma15 | PSNR | 33.79 | Restormer |
| Denoising | Urban100 sigma50 | PSNR | 28.29 | Restormer |
| Denoising | BSD68 sigma15 | PSNR | 31.96 | Restormer |
| Image Denoising | SIDD | PSNR (sRGB) | 40.02 | Restormer |
| Image Denoising | SIDD | SSIM (sRGB) | 0.96 | Restormer |
| Image Denoising | DND | PSNR (sRGB) | 40.03 | Restormer |
| Image Denoising | DND | SSIM (sRGB) | 0.956 | Restormer |
| 2D Classification | GoPro | PSNR | 32.92 | Restormer |
| 2D Classification | GoPro | SSIM | 0.961 | Restormer |
| 2D Classification | RealBlur-R (trained on GoPro) | PSNR (sRGB) | 36.19 | Restormer |
| 2D Classification | RealBlur-R (trained on GoPro) | SSIM (sRGB) | 0.957 | Restormer |
| 2D Classification | MSU BASED | ERQAv2.0 | 0.73875 | Restormer local |
| 2D Classification | MSU BASED | LPIPS | 0.08251 | Restormer local |
| 2D Classification | MSU BASED | PSNR | 31.12341 | Restormer local |
| 2D Classification | MSU BASED | SSIM | 0.94217 | Restormer local |
| 2D Classification | MSU BASED | Subjective | 0.1231 | Restormer local |
| 2D Classification | MSU BASED | VMAF | 65.25911 | Restormer local |
| 2D Classification | MSU BASED | ERQAv2.0 | 0.74776 | Restormer |
| 2D Classification | MSU BASED | LPIPS | 0.08239 | Restormer |
| 2D Classification | MSU BASED | PSNR | 31.76111 | Restormer |
| 2D Classification | MSU BASED | SSIM | 0.94632 | Restormer |
| 2D Classification | MSU BASED | Subjective | 0.1175 | Restormer |
| 2D Classification | MSU BASED | VMAF | 66.3964 | Restormer |
| 2D Classification | RealBlur-J (trained on GoPro) | PSNR (sRGB) | 28.96 | Restormer |
| 2D Classification | RealBlur-J (trained on GoPro) | SSIM (sRGB) | 0.879 | Restormer |
| 2D Classification | HIDE (trained on GOPRO) | PSNR (sRGB) | 31.22 | Restormer |
| 2D Classification | HIDE (trained on GOPRO) | Params (M) | 26.13 | Restormer |
| 2D Classification | HIDE (trained on GOPRO) | SSIM (sRGB) | 0.942 | Restormer |
| 2D Classification | RSBlur | Average PSNR | 33.69 | Restormer |
| Image Deblurring | GoPro | PSNR | 32.92 | Restormer |
| Image Deblurring | GoPro | Params (M) | 26.13 | Restormer |
| Image Deblurring | GoPro | SSIM | 0.961 | Restormer |
| 3D Architecture | SIDD | PSNR (sRGB) | 40.02 | Restormer |
| 3D Architecture | SIDD | SSIM (sRGB) | 0.96 | Restormer |
| 3D Architecture | DND | PSNR (sRGB) | 40.03 | Restormer |
| 3D Architecture | DND | SSIM (sRGB) | 0.956 | Restormer |
| 3D Architecture | Kodak24 sigma50 | PSNR | 30.01 | Restormer |
| 3D Architecture | urban100 sigma15 | Average PSNR | 35.13 | Restormer |
| 3D Architecture | Urban100 sigma50 | PSNR | 30.02 | Restormer |
| 3D Architecture | Urban100 sigma25 | PSNR | 31.46 | Restormer |
| 3D Architecture | Urban100 sigma15 | PSNR | 33.79 | Restormer |
| 3D Architecture | Urban100 sigma50 | PSNR | 28.29 | Restormer |
| 3D Architecture | BSD68 sigma15 | PSNR | 31.96 | Restormer |
| 10-shot image generation | CDD-11 | Average PSNR (dB) | 26.99 | Restormer |
| 10-shot image generation | CDD-11 | SSIM | 0.8646 | Restormer |
| 10-shot image generation | ARAD-1K | MRAE | 0.1833 | Restormer |
| 10-shot image generation | ARAD-1K | PSNR | 33.4 | Restormer |
| 10-shot image generation | ARAD-1K | RMSE | 0.0274 | Restormer |
| 10-shot image generation | CSD | Average PSNR (dB) | 35.43 | Restormer |
| 10-shot image generation | GoPro | PSNR | 32.92 | Restormer |
| 10-shot image generation | GoPro | SSIM | 0.961 | Restormer |
| 10-shot image generation | RealBlur-R (trained on GoPro) | PSNR (sRGB) | 36.19 | Restormer |
| 10-shot image generation | RealBlur-R (trained on GoPro) | SSIM (sRGB) | 0.957 | Restormer |
| 10-shot image generation | MSU BASED | ERQAv2.0 | 0.73875 | Restormer local |
| 10-shot image generation | MSU BASED | LPIPS | 0.08251 | Restormer local |
| 10-shot image generation | MSU BASED | PSNR | 31.12341 | Restormer local |
| 10-shot image generation | MSU BASED | SSIM | 0.94217 | Restormer local |
| 10-shot image generation | MSU BASED | Subjective | 0.1231 | Restormer local |
| 10-shot image generation | MSU BASED | VMAF | 65.25911 | Restormer local |
| 10-shot image generation | MSU BASED | ERQAv2.0 | 0.74776 | Restormer |
| 10-shot image generation | MSU BASED | LPIPS | 0.08239 | Restormer |
| 10-shot image generation | MSU BASED | PSNR | 31.76111 | Restormer |
| 10-shot image generation | MSU BASED | SSIM | 0.94632 | Restormer |
| 10-shot image generation | MSU BASED | Subjective | 0.1175 | Restormer |
| 10-shot image generation | MSU BASED | VMAF | 66.3964 | Restormer |
| 10-shot image generation | RealBlur-J (trained on GoPro) | PSNR (sRGB) | 28.96 | Restormer |
| 10-shot image generation | RealBlur-J (trained on GoPro) | SSIM (sRGB) | 0.879 | Restormer |
| 10-shot image generation | HIDE (trained on GOPRO) | PSNR (sRGB) | 31.22 | Restormer |
| 10-shot image generation | HIDE (trained on GOPRO) | Params (M) | 26.13 | Restormer |
| 10-shot image generation | HIDE (trained on GOPRO) | SSIM (sRGB) | 0.942 | Restormer |
| 10-shot image generation | RSBlur | Average PSNR | 33.69 | Restormer |
| 10-shot image generation | GoPro | PSNR | 32.92 | Restormer |
| 10-shot image generation | GoPro | Params (M) | 26.13 | Restormer |
| 10-shot image generation | GoPro | SSIM | 0.961 | Restormer |
| Video deraining | VRDS | PSNR | 29.59 | Restormer |
| Video deraining | VRDS | SSIM | 0.9206 | Restormer |
| 1 Image, 2*2 Stitchi | GoPro | PSNR | 32.92 | Restormer |
| 1 Image, 2*2 Stitchi | GoPro | Params (M) | 26.13 | Restormer |
| 1 Image, 2*2 Stitchi | GoPro | SSIM | 0.961 | Restormer |
| 16k | GoPro | PSNR | 32.92 | Restormer |
| 16k | GoPro | Params (M) | 26.13 | Restormer |
| 16k | GoPro | SSIM | 0.961 | Restormer |
| Blind Image Deblurring | GoPro | PSNR | 32.92 | Restormer |
| Blind Image Deblurring | GoPro | SSIM | 0.961 | Restormer |
| Blind Image Deblurring | RealBlur-R (trained on GoPro) | PSNR (sRGB) | 36.19 | Restormer |
| Blind Image Deblurring | RealBlur-R (trained on GoPro) | SSIM (sRGB) | 0.957 | Restormer |
| Blind Image Deblurring | MSU BASED | ERQAv2.0 | 0.73875 | Restormer local |
| Blind Image Deblurring | MSU BASED | LPIPS | 0.08251 | Restormer local |
| Blind Image Deblurring | MSU BASED | PSNR | 31.12341 | Restormer local |
| Blind Image Deblurring | MSU BASED | SSIM | 0.94217 | Restormer local |
| Blind Image Deblurring | MSU BASED | Subjective | 0.1231 | Restormer local |
| Blind Image Deblurring | MSU BASED | VMAF | 65.25911 | Restormer local |
| Blind Image Deblurring | MSU BASED | ERQAv2.0 | 0.74776 | Restormer |
| Blind Image Deblurring | MSU BASED | LPIPS | 0.08239 | Restormer |
| Blind Image Deblurring | MSU BASED | PSNR | 31.76111 | Restormer |
| Blind Image Deblurring | MSU BASED | SSIM | 0.94632 | Restormer |
| Blind Image Deblurring | MSU BASED | Subjective | 0.1175 | Restormer |
| Blind Image Deblurring | MSU BASED | VMAF | 66.3964 | Restormer |
| Blind Image Deblurring | RealBlur-J (trained on GoPro) | PSNR (sRGB) | 28.96 | Restormer |
| Blind Image Deblurring | RealBlur-J (trained on GoPro) | SSIM (sRGB) | 0.879 | Restormer |
| Blind Image Deblurring | HIDE (trained on GOPRO) | PSNR (sRGB) | 31.22 | Restormer |
| Blind Image Deblurring | HIDE (trained on GOPRO) | Params (M) | 26.13 | Restormer |
| Blind Image Deblurring | HIDE (trained on GOPRO) | SSIM (sRGB) | 0.942 | Restormer |
| Blind Image Deblurring | RSBlur | Average PSNR | 33.69 | Restormer |