Siyue Yu, Jimin Xiao, Bingfeng Zhang, Eng Gee Lim
Co-salient object detection, with the target of detecting co-existed salient objects among a group of images, is gaining popularity. Recent works use the attention mechanism or extra information to aggregate common co-salient features, leading to incomplete even incorrect responses for target objects. In this paper, we aim to mine comprehensive co-salient features with democracy and reduce background interference without introducing any extra information. To achieve this, we design a democratic prototype generation module to generate democratic response maps, covering sufficient co-salient regions and thereby involving more shared attributes of co-salient objects. Then a comprehensive prototype based on the response maps can be generated as a guide for final prediction. To suppress the noisy background information in the prototype, we propose a self-contrastive learning module, where both positive and negative pairs are formed without relying on additional classification information. Besides, we also design a democratic feature enhancement module to further strengthen the co-salient features by readjusting attention values. Extensive experiments show that our model obtains better performance than previous state-of-the-art methods, especially on challenging real-world cases (e.g., for CoCA, we obtain a gain of 2.0% for MAE, 5.4% for maximum F-measure, 2.3% for maximum E-measure, and 3.7% for S-measure) under the same settings. Code will be released soon.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Saliency Detection | CoSOD3k | MAE | 0.067 | DCFM |
| Saliency Detection | CoSOD3k | S-measure | 0.809 | DCFM |
| Saliency Detection | CoSOD3k | max E-measure | 0.871 | DCFM |
| Saliency Detection | CoSOD3k | max F-measure | 0.805 | DCFM |
| Saliency Detection | CoSOD3k | mean E-measure | 0.871 | DCFM |
| Saliency Detection | CoSOD3k | mean F-measure | 0.8 | DCFM |
| Saliency Detection | CoCA | MAE | 0.085 | DCFM |
| Saliency Detection | CoCA | Mean F-measure | 0.593 | DCFM |
| Saliency Detection | CoCA | S-measure | 0.71 | DCFM |
| Saliency Detection | CoCA | max E-measure | 0.783 | DCFM |
| Saliency Detection | CoCA | max F-measure | 0.598 | DCFM |
| Saliency Detection | CoCA | mean E-measure | 0.778 | DCFM |
| Saliency Detection | CoSal2015 | MAE | 0.067 | DCFM |
| Saliency Detection | CoSal2015 | S-measure | 0.838 | DCFM |
| Saliency Detection | CoSal2015 | max E-measure | 0.893 | DCFM |
| Saliency Detection | CoSal2015 | max F-measure | 0.856 | DCFM |
| Saliency Detection | CoSal2015 | mean E-measure | 0.889 | DCFM |
| Saliency Detection | CoSal2015 | mean F-measure | 0.85 | DCFM |
| Object Detection | CoSOD3k | MAE | 0.067 | DCFM |
| Object Detection | CoSOD3k | S-measure | 0.809 | DCFM |
| Object Detection | CoSOD3k | max E-measure | 0.871 | DCFM |
| Object Detection | CoSOD3k | max F-measure | 0.805 | DCFM |
| Object Detection | CoSOD3k | mean E-measure | 0.871 | DCFM |
| Object Detection | CoSOD3k | mean F-measure | 0.8 | DCFM |
| Object Detection | CoCA | MAE | 0.085 | DCFM |
| Object Detection | CoCA | Mean F-measure | 0.593 | DCFM |
| Object Detection | CoCA | S-measure | 0.71 | DCFM |
| Object Detection | CoCA | max E-measure | 0.783 | DCFM |
| Object Detection | CoCA | max F-measure | 0.598 | DCFM |
| Object Detection | CoCA | mean E-measure | 0.778 | DCFM |
| Object Detection | CoSal2015 | MAE | 0.067 | DCFM |
| Object Detection | CoSal2015 | S-measure | 0.838 | DCFM |
| Object Detection | CoSal2015 | max E-measure | 0.893 | DCFM |
| Object Detection | CoSal2015 | max F-measure | 0.856 | DCFM |
| Object Detection | CoSal2015 | mean E-measure | 0.889 | DCFM |
| Object Detection | CoSal2015 | mean F-measure | 0.85 | DCFM |
| 3D | CoSOD3k | MAE | 0.067 | DCFM |
| 3D | CoSOD3k | S-measure | 0.809 | DCFM |
| 3D | CoSOD3k | max E-measure | 0.871 | DCFM |
| 3D | CoSOD3k | max F-measure | 0.805 | DCFM |
| 3D | CoSOD3k | mean E-measure | 0.871 | DCFM |
| 3D | CoSOD3k | mean F-measure | 0.8 | DCFM |
| 3D | CoCA | MAE | 0.085 | DCFM |
| 3D | CoCA | Mean F-measure | 0.593 | DCFM |
| 3D | CoCA | S-measure | 0.71 | DCFM |
| 3D | CoCA | max E-measure | 0.783 | DCFM |
| 3D | CoCA | max F-measure | 0.598 | DCFM |
| 3D | CoCA | mean E-measure | 0.778 | DCFM |
| 3D | CoSal2015 | MAE | 0.067 | DCFM |
| 3D | CoSal2015 | S-measure | 0.838 | DCFM |
| 3D | CoSal2015 | max E-measure | 0.893 | DCFM |
| 3D | CoSal2015 | max F-measure | 0.856 | DCFM |
| 3D | CoSal2015 | mean E-measure | 0.889 | DCFM |
| 3D | CoSal2015 | mean F-measure | 0.85 | DCFM |
| RGB Salient Object Detection | CoSOD3k | MAE | 0.067 | DCFM |
| RGB Salient Object Detection | CoSOD3k | S-measure | 0.809 | DCFM |
| RGB Salient Object Detection | CoSOD3k | max E-measure | 0.871 | DCFM |
| RGB Salient Object Detection | CoSOD3k | max F-measure | 0.805 | DCFM |
| RGB Salient Object Detection | CoSOD3k | mean E-measure | 0.871 | DCFM |
| RGB Salient Object Detection | CoSOD3k | mean F-measure | 0.8 | DCFM |
| RGB Salient Object Detection | CoCA | MAE | 0.085 | DCFM |
| RGB Salient Object Detection | CoCA | Mean F-measure | 0.593 | DCFM |
| RGB Salient Object Detection | CoCA | S-measure | 0.71 | DCFM |
| RGB Salient Object Detection | CoCA | max E-measure | 0.783 | DCFM |
| RGB Salient Object Detection | CoCA | max F-measure | 0.598 | DCFM |
| RGB Salient Object Detection | CoCA | mean E-measure | 0.778 | DCFM |
| RGB Salient Object Detection | CoSal2015 | MAE | 0.067 | DCFM |
| RGB Salient Object Detection | CoSal2015 | S-measure | 0.838 | DCFM |
| RGB Salient Object Detection | CoSal2015 | max E-measure | 0.893 | DCFM |
| RGB Salient Object Detection | CoSal2015 | max F-measure | 0.856 | DCFM |
| RGB Salient Object Detection | CoSal2015 | mean E-measure | 0.889 | DCFM |
| RGB Salient Object Detection | CoSal2015 | mean F-measure | 0.85 | DCFM |
| 2D Classification | CoSOD3k | MAE | 0.067 | DCFM |
| 2D Classification | CoSOD3k | S-measure | 0.809 | DCFM |
| 2D Classification | CoSOD3k | max E-measure | 0.871 | DCFM |
| 2D Classification | CoSOD3k | max F-measure | 0.805 | DCFM |
| 2D Classification | CoSOD3k | mean E-measure | 0.871 | DCFM |
| 2D Classification | CoSOD3k | mean F-measure | 0.8 | DCFM |
| 2D Classification | CoCA | MAE | 0.085 | DCFM |
| 2D Classification | CoCA | Mean F-measure | 0.593 | DCFM |
| 2D Classification | CoCA | S-measure | 0.71 | DCFM |
| 2D Classification | CoCA | max E-measure | 0.783 | DCFM |
| 2D Classification | CoCA | max F-measure | 0.598 | DCFM |
| 2D Classification | CoCA | mean E-measure | 0.778 | DCFM |
| 2D Classification | CoSal2015 | MAE | 0.067 | DCFM |
| 2D Classification | CoSal2015 | S-measure | 0.838 | DCFM |
| 2D Classification | CoSal2015 | max E-measure | 0.893 | DCFM |
| 2D Classification | CoSal2015 | max F-measure | 0.856 | DCFM |
| 2D Classification | CoSal2015 | mean E-measure | 0.889 | DCFM |
| 2D Classification | CoSal2015 | mean F-measure | 0.85 | DCFM |
| 2D Object Detection | CoSOD3k | MAE | 0.067 | DCFM |
| 2D Object Detection | CoSOD3k | S-measure | 0.809 | DCFM |
| 2D Object Detection | CoSOD3k | max E-measure | 0.871 | DCFM |
| 2D Object Detection | CoSOD3k | max F-measure | 0.805 | DCFM |
| 2D Object Detection | CoSOD3k | mean E-measure | 0.871 | DCFM |
| 2D Object Detection | CoSOD3k | mean F-measure | 0.8 | DCFM |
| 2D Object Detection | CoCA | MAE | 0.085 | DCFM |
| 2D Object Detection | CoCA | Mean F-measure | 0.593 | DCFM |
| 2D Object Detection | CoCA | S-measure | 0.71 | DCFM |
| 2D Object Detection | CoCA | max E-measure | 0.783 | DCFM |
| 2D Object Detection | CoCA | max F-measure | 0.598 | DCFM |
| 2D Object Detection | CoCA | mean E-measure | 0.778 | DCFM |
| 2D Object Detection | CoSal2015 | MAE | 0.067 | DCFM |
| 2D Object Detection | CoSal2015 | S-measure | 0.838 | DCFM |
| 2D Object Detection | CoSal2015 | max E-measure | 0.893 | DCFM |
| 2D Object Detection | CoSal2015 | max F-measure | 0.856 | DCFM |
| 2D Object Detection | CoSal2015 | mean E-measure | 0.889 | DCFM |
| 2D Object Detection | CoSal2015 | mean F-measure | 0.85 | DCFM |
| 16k | CoSOD3k | MAE | 0.067 | DCFM |
| 16k | CoSOD3k | S-measure | 0.809 | DCFM |
| 16k | CoSOD3k | max E-measure | 0.871 | DCFM |
| 16k | CoSOD3k | max F-measure | 0.805 | DCFM |
| 16k | CoSOD3k | mean E-measure | 0.871 | DCFM |
| 16k | CoSOD3k | mean F-measure | 0.8 | DCFM |
| 16k | CoCA | MAE | 0.085 | DCFM |
| 16k | CoCA | Mean F-measure | 0.593 | DCFM |
| 16k | CoCA | S-measure | 0.71 | DCFM |
| 16k | CoCA | max E-measure | 0.783 | DCFM |
| 16k | CoCA | max F-measure | 0.598 | DCFM |
| 16k | CoCA | mean E-measure | 0.778 | DCFM |
| 16k | CoSal2015 | MAE | 0.067 | DCFM |
| 16k | CoSal2015 | S-measure | 0.838 | DCFM |
| 16k | CoSal2015 | max E-measure | 0.893 | DCFM |
| 16k | CoSal2015 | max F-measure | 0.856 | DCFM |
| 16k | CoSal2015 | mean E-measure | 0.889 | DCFM |
| 16k | CoSal2015 | mean F-measure | 0.85 | DCFM |