Runmin Cong, Qi Qin, Chen Zhang, Qiuping Jiang, Shiqi Wang, Yao Zhao, Sam Kwong
Fully-supervised salient object detection (SOD) methods have made great progress, but such methods often rely on a large number of pixel-level annotations, which are time-consuming and labour-intensive. In this paper, we focus on a new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised method and a small number of real labels. To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies. In terms of model framework, we decouple the task into label refinement sub-task and salient object detection sub-task, which cooperate with each other and train alternately. Specifically, the R-Net is designed as a two-stream encoder-decoder model equipped with Blender with Guidance and Aggregation Mechanisms (BGA), aiming to rectify the coarse labels for more reliable pseudo-labels, while the S-Net is a replaceable SOD network supervised by the pseudo labels generated by the current R-Net. Note that, we only need to use the trained S-Net for testing. Moreover, in order to guarantee the effectiveness and efficiency of network training, we design three training strategies, including alternate iteration mechanism, group-wise incremental mechanism, and credibility verification mechanism. Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods both qualitatively and quantitatively.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | ECSSD | F-Score | 0.899 | HybridSOD |
| Object Detection | ECSSD | MAE | 0.051 | HybridSOD |
| Object Detection | ECSSD | S-Measure | 0.886 | HybridSOD |
| Object Detection | PASCAL-S | F-Score | 0.827 | HybridSOD |
| Object Detection | PASCAL-S | MAE | 0.076 | HybridSOD |
| Object Detection | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| Object Detection | HKU-IS | F-Score | 0.892 | HybridSOD |
| Object Detection | HKU-IS | MAE | 0.038 | HybridSOD |
| Object Detection | HKU-IS | S-Measure | 0.887 | HybridSOD |
| Object Detection | DUTS-TE | MAE | 0.05 | HybridSOD |
| Object Detection | DUTS-TE | S-Measure | 0.837 | HybridSOD |
| 3D | ECSSD | F-Score | 0.899 | HybridSOD |
| 3D | ECSSD | MAE | 0.051 | HybridSOD |
| 3D | ECSSD | S-Measure | 0.886 | HybridSOD |
| 3D | PASCAL-S | F-Score | 0.827 | HybridSOD |
| 3D | PASCAL-S | MAE | 0.076 | HybridSOD |
| 3D | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| 3D | HKU-IS | F-Score | 0.892 | HybridSOD |
| 3D | HKU-IS | MAE | 0.038 | HybridSOD |
| 3D | HKU-IS | S-Measure | 0.887 | HybridSOD |
| 3D | DUTS-TE | MAE | 0.05 | HybridSOD |
| 3D | DUTS-TE | S-Measure | 0.837 | HybridSOD |
| RGB Salient Object Detection | ECSSD | F-Score | 0.899 | HybridSOD |
| RGB Salient Object Detection | ECSSD | MAE | 0.051 | HybridSOD |
| RGB Salient Object Detection | ECSSD | S-Measure | 0.886 | HybridSOD |
| RGB Salient Object Detection | PASCAL-S | F-Score | 0.827 | HybridSOD |
| RGB Salient Object Detection | PASCAL-S | MAE | 0.076 | HybridSOD |
| RGB Salient Object Detection | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| RGB Salient Object Detection | HKU-IS | F-Score | 0.892 | HybridSOD |
| RGB Salient Object Detection | HKU-IS | MAE | 0.038 | HybridSOD |
| RGB Salient Object Detection | HKU-IS | S-Measure | 0.887 | HybridSOD |
| RGB Salient Object Detection | DUTS-TE | MAE | 0.05 | HybridSOD |
| RGB Salient Object Detection | DUTS-TE | S-Measure | 0.837 | HybridSOD |
| 2D Classification | ECSSD | F-Score | 0.899 | HybridSOD |
| 2D Classification | ECSSD | MAE | 0.051 | HybridSOD |
| 2D Classification | ECSSD | S-Measure | 0.886 | HybridSOD |
| 2D Classification | PASCAL-S | F-Score | 0.827 | HybridSOD |
| 2D Classification | PASCAL-S | MAE | 0.076 | HybridSOD |
| 2D Classification | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| 2D Classification | HKU-IS | F-Score | 0.892 | HybridSOD |
| 2D Classification | HKU-IS | MAE | 0.038 | HybridSOD |
| 2D Classification | HKU-IS | S-Measure | 0.887 | HybridSOD |
| 2D Classification | DUTS-TE | MAE | 0.05 | HybridSOD |
| 2D Classification | DUTS-TE | S-Measure | 0.837 | HybridSOD |
| 2D Object Detection | ECSSD | F-Score | 0.899 | HybridSOD |
| 2D Object Detection | ECSSD | MAE | 0.051 | HybridSOD |
| 2D Object Detection | ECSSD | S-Measure | 0.886 | HybridSOD |
| 2D Object Detection | PASCAL-S | F-Score | 0.827 | HybridSOD |
| 2D Object Detection | PASCAL-S | MAE | 0.076 | HybridSOD |
| 2D Object Detection | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| 2D Object Detection | HKU-IS | F-Score | 0.892 | HybridSOD |
| 2D Object Detection | HKU-IS | MAE | 0.038 | HybridSOD |
| 2D Object Detection | HKU-IS | S-Measure | 0.887 | HybridSOD |
| 2D Object Detection | DUTS-TE | MAE | 0.05 | HybridSOD |
| 2D Object Detection | DUTS-TE | S-Measure | 0.837 | HybridSOD |
| 16k | ECSSD | F-Score | 0.899 | HybridSOD |
| 16k | ECSSD | MAE | 0.051 | HybridSOD |
| 16k | ECSSD | S-Measure | 0.886 | HybridSOD |
| 16k | PASCAL-S | F-Score | 0.827 | HybridSOD |
| 16k | PASCAL-S | MAE | 0.076 | HybridSOD |
| 16k | PASCAL-S | S-Measure | 0.828 | HybridSOD |
| 16k | HKU-IS | F-Score | 0.892 | HybridSOD |
| 16k | HKU-IS | MAE | 0.038 | HybridSOD |
| 16k | HKU-IS | S-Measure | 0.887 | HybridSOD |
| 16k | DUTS-TE | MAE | 0.05 | HybridSOD |
| 16k | DUTS-TE | S-Measure | 0.837 | HybridSOD |