Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD). The architecture of our U$^2$-Net is a two-level nested U-structure. The design has the following advantages: (1) it is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in our proposed ReSidual U-blocks (RSU), (2) it increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in these RSU blocks. This architecture enables us to train a deep network from scratch without using backbones from image classification tasks. We instantiate two models of the proposed architecture, U$^2$-Net (176.3 MB, 30 FPS on GTX 1080Ti GPU) and U$^2$-Net$^{\dagger}$ (4.7 MB, 40 FPS), to facilitate the usage in different environments. Both models achieve competitive performance on six SOD datasets. The code is available: https://github.com/NathanUA/U-2-Net.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Saliency Detection | DUT-OMRON | MAE | 0.054 | U2-Net |
| Saliency Detection | DUT-OMRON | Fwβ | 0.731 | U2-Net+ |
| Saliency Detection | DUT-OMRON | MAE | 0.06 | U2-Net+ |
| Saliency Detection | DUT-OMRON | Sm | 0.837 | U2-Net+ |
| Saliency Detection | DUT-OMRON | relaxFbβ | 0.676 | U2-Net+ |
| Saliency Detection | DUT-OMRON | {max}Fβ | 0.813 | U2-Net+ |
| Saliency Detection | HKU-IS | Fwβ | 0.867 | U2-Net+ |
| Saliency Detection | HKU-IS | MAE | 0.037 | U2-Net+ |
| Saliency Detection | HKU-IS | Sm | 0.908 | U2-Net+ |
| Saliency Detection | HKU-IS | relaxFbβ | 0.794 | U2-Net+ |
| Saliency Detection | HKU-IS | {max}Fβ | 0.928 | U2-Net+ |
| Object Detection | DIS-TE4 | E-measure | 0.847 | U2Net |
| Object Detection | DIS-TE4 | HCE | 3653 | U2Net |
| Object Detection | DIS-TE4 | MAE | 0.087 | U2Net |
| Object Detection | DIS-TE4 | S-Measure | 0.807 | U2Net |
| Object Detection | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| Object Detection | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| Object Detection | DIS-VD | E-measure | 0.823 | U2Net |
| Object Detection | DIS-VD | HCE | 1413 | U2Net |
| Object Detection | DIS-VD | MAE | 0.09 | U2Net |
| Object Detection | DIS-VD | S-Measure | 0.781 | U2Net |
| Object Detection | DIS-VD | max F-Measure | 0.748 | U2Net |
| Object Detection | DIS-VD | weighted F-measure | 0.656 | U2Net |
| Object Detection | DIS-TE2 | E-measure | 0.833 | U2Net |
| Object Detection | DIS-TE2 | HCE | 490 | U2Net |
| Object Detection | DIS-TE2 | MAE | 0.085 | U2Net |
| Object Detection | DIS-TE2 | S-Measure | 0.788 | U2Net |
| Object Detection | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| Object Detection | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| Object Detection | DIS-TE1 | E-measure | 0.801 | U2Net |
| Object Detection | DIS-TE1 | HCE | 224 | U2Net |
| Object Detection | DIS-TE1 | MAE | 0.083 | U2Net |
| Object Detection | DIS-TE1 | S-Measure | 0.76 | U2Net |
| Object Detection | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| Object Detection | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| Object Detection | DIS-TE3 | E-measure | 0.858 | U2Net |
| Object Detection | DIS-TE3 | HCE | 965 | U2Net |
| Object Detection | DIS-TE3 | MAE | 0.079 | U2Net |
| Object Detection | DIS-TE3 | S-Measure | 0.809 | U2Net |
| Object Detection | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| Object Detection | DIS-TE3 | weighted F-measure | 0.707 | U2Net |
| 3D | DIS-TE4 | E-measure | 0.847 | U2Net |
| 3D | DIS-TE4 | HCE | 3653 | U2Net |
| 3D | DIS-TE4 | MAE | 0.087 | U2Net |
| 3D | DIS-TE4 | S-Measure | 0.807 | U2Net |
| 3D | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| 3D | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| 3D | DIS-VD | E-measure | 0.823 | U2Net |
| 3D | DIS-VD | HCE | 1413 | U2Net |
| 3D | DIS-VD | MAE | 0.09 | U2Net |
| 3D | DIS-VD | S-Measure | 0.781 | U2Net |
| 3D | DIS-VD | max F-Measure | 0.748 | U2Net |
| 3D | DIS-VD | weighted F-measure | 0.656 | U2Net |
| 3D | DIS-TE2 | E-measure | 0.833 | U2Net |
| 3D | DIS-TE2 | HCE | 490 | U2Net |
| 3D | DIS-TE2 | MAE | 0.085 | U2Net |
| 3D | DIS-TE2 | S-Measure | 0.788 | U2Net |
| 3D | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| 3D | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| 3D | DIS-TE1 | E-measure | 0.801 | U2Net |
| 3D | DIS-TE1 | HCE | 224 | U2Net |
| 3D | DIS-TE1 | MAE | 0.083 | U2Net |
| 3D | DIS-TE1 | S-Measure | 0.76 | U2Net |
| 3D | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| 3D | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| 3D | DIS-TE3 | E-measure | 0.858 | U2Net |
| 3D | DIS-TE3 | HCE | 965 | U2Net |
| 3D | DIS-TE3 | MAE | 0.079 | U2Net |
| 3D | DIS-TE3 | S-Measure | 0.809 | U2Net |
| 3D | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| 3D | DIS-TE3 | weighted F-measure | 0.707 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | E-measure | 0.847 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | HCE | 3653 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | MAE | 0.087 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | S-Measure | 0.807 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| RGB Salient Object Detection | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| RGB Salient Object Detection | DIS-VD | E-measure | 0.823 | U2Net |
| RGB Salient Object Detection | DIS-VD | HCE | 1413 | U2Net |
| RGB Salient Object Detection | DIS-VD | MAE | 0.09 | U2Net |
| RGB Salient Object Detection | DIS-VD | S-Measure | 0.781 | U2Net |
| RGB Salient Object Detection | DIS-VD | max F-Measure | 0.748 | U2Net |
| RGB Salient Object Detection | DIS-VD | weighted F-measure | 0.656 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | E-measure | 0.833 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | HCE | 490 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | MAE | 0.085 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | S-Measure | 0.788 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| RGB Salient Object Detection | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | E-measure | 0.801 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | HCE | 224 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | MAE | 0.083 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | S-Measure | 0.76 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| RGB Salient Object Detection | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | E-measure | 0.858 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | HCE | 965 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | MAE | 0.079 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | S-Measure | 0.809 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| RGB Salient Object Detection | DIS-TE3 | weighted F-measure | 0.707 | U2Net |
| Salient Object Detection | ECSSD | MAE | 0.041 | F3Net |
| Salient Object Detection | ECSSD | S-measure | 0.918 | F3Net |
| Salient Object Detection | SOD | Fwβ | 0.697 | U2-Net+ |
| Salient Object Detection | SOD | MAE | 0.124 | U2-Net+ |
| Salient Object Detection | SOD | Sm | 0.759 | U2-Net+ |
| Salient Object Detection | SOD | relaxFbβ | 0.559 | U2-Net+ |
| Salient Object Detection | SOD | {max}Fβ | 0.841 | U2-Net+ |
| Salient Object Detection | HKU-IS | MAE | 0.031 | U2Net |
| Salient Object Detection | PASCAL-S | MAE | 0.086 | F3Net |
| Salient Object Detection | PASCAL-S | S-measure | 0.831 | F3Net |
| Salient Object Detection | PASCAL-S | max_F1 | 0.768 | F3Net |
| 2D Classification | DIS-TE4 | E-measure | 0.847 | U2Net |
| 2D Classification | DIS-TE4 | HCE | 3653 | U2Net |
| 2D Classification | DIS-TE4 | MAE | 0.087 | U2Net |
| 2D Classification | DIS-TE4 | S-Measure | 0.807 | U2Net |
| 2D Classification | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| 2D Classification | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| 2D Classification | DIS-VD | E-measure | 0.823 | U2Net |
| 2D Classification | DIS-VD | HCE | 1413 | U2Net |
| 2D Classification | DIS-VD | MAE | 0.09 | U2Net |
| 2D Classification | DIS-VD | S-Measure | 0.781 | U2Net |
| 2D Classification | DIS-VD | max F-Measure | 0.748 | U2Net |
| 2D Classification | DIS-VD | weighted F-measure | 0.656 | U2Net |
| 2D Classification | DIS-TE2 | E-measure | 0.833 | U2Net |
| 2D Classification | DIS-TE2 | HCE | 490 | U2Net |
| 2D Classification | DIS-TE2 | MAE | 0.085 | U2Net |
| 2D Classification | DIS-TE2 | S-Measure | 0.788 | U2Net |
| 2D Classification | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| 2D Classification | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| 2D Classification | DIS-TE1 | E-measure | 0.801 | U2Net |
| 2D Classification | DIS-TE1 | HCE | 224 | U2Net |
| 2D Classification | DIS-TE1 | MAE | 0.083 | U2Net |
| 2D Classification | DIS-TE1 | S-Measure | 0.76 | U2Net |
| 2D Classification | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| 2D Classification | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| 2D Classification | DIS-TE3 | E-measure | 0.858 | U2Net |
| 2D Classification | DIS-TE3 | HCE | 965 | U2Net |
| 2D Classification | DIS-TE3 | MAE | 0.079 | U2Net |
| 2D Classification | DIS-TE3 | S-Measure | 0.809 | U2Net |
| 2D Classification | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| 2D Classification | DIS-TE3 | weighted F-measure | 0.707 | U2Net |
| 2D Object Detection | DIS-TE4 | E-measure | 0.847 | U2Net |
| 2D Object Detection | DIS-TE4 | HCE | 3653 | U2Net |
| 2D Object Detection | DIS-TE4 | MAE | 0.087 | U2Net |
| 2D Object Detection | DIS-TE4 | S-Measure | 0.807 | U2Net |
| 2D Object Detection | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| 2D Object Detection | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| 2D Object Detection | DIS-VD | E-measure | 0.823 | U2Net |
| 2D Object Detection | DIS-VD | HCE | 1413 | U2Net |
| 2D Object Detection | DIS-VD | MAE | 0.09 | U2Net |
| 2D Object Detection | DIS-VD | S-Measure | 0.781 | U2Net |
| 2D Object Detection | DIS-VD | max F-Measure | 0.748 | U2Net |
| 2D Object Detection | DIS-VD | weighted F-measure | 0.656 | U2Net |
| 2D Object Detection | DIS-TE2 | E-measure | 0.833 | U2Net |
| 2D Object Detection | DIS-TE2 | HCE | 490 | U2Net |
| 2D Object Detection | DIS-TE2 | MAE | 0.085 | U2Net |
| 2D Object Detection | DIS-TE2 | S-Measure | 0.788 | U2Net |
| 2D Object Detection | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| 2D Object Detection | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| 2D Object Detection | DIS-TE1 | E-measure | 0.801 | U2Net |
| 2D Object Detection | DIS-TE1 | HCE | 224 | U2Net |
| 2D Object Detection | DIS-TE1 | MAE | 0.083 | U2Net |
| 2D Object Detection | DIS-TE1 | S-Measure | 0.76 | U2Net |
| 2D Object Detection | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| 2D Object Detection | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| 2D Object Detection | DIS-TE3 | E-measure | 0.858 | U2Net |
| 2D Object Detection | DIS-TE3 | HCE | 965 | U2Net |
| 2D Object Detection | DIS-TE3 | MAE | 0.079 | U2Net |
| 2D Object Detection | DIS-TE3 | S-Measure | 0.809 | U2Net |
| 2D Object Detection | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| 2D Object Detection | DIS-TE3 | weighted F-measure | 0.707 | U2Net |
| 16k | DIS-TE4 | E-measure | 0.847 | U2Net |
| 16k | DIS-TE4 | HCE | 3653 | U2Net |
| 16k | DIS-TE4 | MAE | 0.087 | U2Net |
| 16k | DIS-TE4 | S-Measure | 0.807 | U2Net |
| 16k | DIS-TE4 | max F-Measure | 0.795 | U2Net |
| 16k | DIS-TE4 | weighted F-measure | 0.705 | U2Net |
| 16k | DIS-VD | E-measure | 0.823 | U2Net |
| 16k | DIS-VD | HCE | 1413 | U2Net |
| 16k | DIS-VD | MAE | 0.09 | U2Net |
| 16k | DIS-VD | S-Measure | 0.781 | U2Net |
| 16k | DIS-VD | max F-Measure | 0.748 | U2Net |
| 16k | DIS-VD | weighted F-measure | 0.656 | U2Net |
| 16k | DIS-TE2 | E-measure | 0.833 | U2Net |
| 16k | DIS-TE2 | HCE | 490 | U2Net |
| 16k | DIS-TE2 | MAE | 0.085 | U2Net |
| 16k | DIS-TE2 | S-Measure | 0.788 | U2Net |
| 16k | DIS-TE2 | max F-Measure | 0.756 | U2Net |
| 16k | DIS-TE2 | weighted F-measure | 0.668 | U2Net |
| 16k | DIS-TE1 | E-measure | 0.801 | U2Net |
| 16k | DIS-TE1 | HCE | 224 | U2Net |
| 16k | DIS-TE1 | MAE | 0.083 | U2Net |
| 16k | DIS-TE1 | S-Measure | 0.76 | U2Net |
| 16k | DIS-TE1 | max F-Measure | 0.694 | U2Net |
| 16k | DIS-TE1 | weighted F-measure | 0.601 | U2Net |
| 16k | DIS-TE3 | E-measure | 0.858 | U2Net |
| 16k | DIS-TE3 | HCE | 965 | U2Net |
| 16k | DIS-TE3 | MAE | 0.079 | U2Net |
| 16k | DIS-TE3 | S-Measure | 0.809 | U2Net |
| 16k | DIS-TE3 | max F-Measure | 0.798 | U2Net |
| 16k | DIS-TE3 | weighted F-measure | 0.707 | U2Net |