TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Siamese Network for RGB-D Salient Object Detection and Bey...

Siamese Network for RGB-D Salient Object Detection and Beyond

Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, Ce Zhu

2020-08-26Semantic SegmentationSalient Object DetectionRGB-D Salient Object Detectionobject-detectionObject DetectionRGB Salient Object Detection
PaperPDFCode(official)Code

Abstract

Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the state-of-the-art models by an average of ~2.0% (max F-measure) across seven challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T (thermal infrared) SOD and video SOD, achieving comparable or even better performance against state-of-the-art methods. We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models on the task of RGB-D SOD. These facts further confirm that the proposed framework could offer a potential solution for various applications and provide more insight into the cross-modal complementarity task.

Results

TaskDatasetMetricValueModel
Object DetectionNJU2KAverage MAE0.04JL-DCF*
Object DetectionNJU2KS-Measure91.1JL-DCF*
Object DetectionNJU2Kmax E-Measure94.8JL-DCF*
Object DetectionNJU2Kmax F-Measure91.3JL-DCF*
Object DetectionSTEREAverage MAE0.039JL-DCF*
Object DetectionSTERES-Measure91.1JL-DCF*
Object DetectionSTEREmax E-Measure94.9JL-DCF*
Object DetectionSTEREmax F-Measure90.7JL-DCF*
Object DetectionSIPAverage MAE0.046JL-DCF*
Object DetectionSIPS-Measure89.2JL-DCF*
Object DetectionSIPmax E-Measure94.9JL-DCF*
Object DetectionSIPmax F-Measure90JL-DCF*
Object DetectionNLPRAverage MAE0.023JL-DCF*
Object DetectionNLPRS-Measure92.6JL-DCF*
Object DetectionNLPRmax E-Measure96.4JL-DCF*
Object DetectionNLPRmax F-Measure91.7JL-DCF*
Object DetectionDESAverage MAE0.021JL-DCF*
Object DetectionDESS-Measure93.6JL-DCF*
Object DetectionDESmax E-Measure97.5JL-DCF*
Object DetectionDESmax F-Measure92.9JL-DCF*
3DNJU2KAverage MAE0.04JL-DCF*
3DNJU2KS-Measure91.1JL-DCF*
3DNJU2Kmax E-Measure94.8JL-DCF*
3DNJU2Kmax F-Measure91.3JL-DCF*
3DSTEREAverage MAE0.039JL-DCF*
3DSTERES-Measure91.1JL-DCF*
3DSTEREmax E-Measure94.9JL-DCF*
3DSTEREmax F-Measure90.7JL-DCF*
3DSIPAverage MAE0.046JL-DCF*
3DSIPS-Measure89.2JL-DCF*
3DSIPmax E-Measure94.9JL-DCF*
3DSIPmax F-Measure90JL-DCF*
3DNLPRAverage MAE0.023JL-DCF*
3DNLPRS-Measure92.6JL-DCF*
3DNLPRmax E-Measure96.4JL-DCF*
3DNLPRmax F-Measure91.7JL-DCF*
3DDESAverage MAE0.021JL-DCF*
3DDESS-Measure93.6JL-DCF*
3DDESmax E-Measure97.5JL-DCF*
3DDESmax F-Measure92.9JL-DCF*
2D ClassificationNJU2KAverage MAE0.04JL-DCF*
2D ClassificationNJU2KS-Measure91.1JL-DCF*
2D ClassificationNJU2Kmax E-Measure94.8JL-DCF*
2D ClassificationNJU2Kmax F-Measure91.3JL-DCF*
2D ClassificationSTEREAverage MAE0.039JL-DCF*
2D ClassificationSTERES-Measure91.1JL-DCF*
2D ClassificationSTEREmax E-Measure94.9JL-DCF*
2D ClassificationSTEREmax F-Measure90.7JL-DCF*
2D ClassificationSIPAverage MAE0.046JL-DCF*
2D ClassificationSIPS-Measure89.2JL-DCF*
2D ClassificationSIPmax E-Measure94.9JL-DCF*
2D ClassificationSIPmax F-Measure90JL-DCF*
2D ClassificationNLPRAverage MAE0.023JL-DCF*
2D ClassificationNLPRS-Measure92.6JL-DCF*
2D ClassificationNLPRmax E-Measure96.4JL-DCF*
2D ClassificationNLPRmax F-Measure91.7JL-DCF*
2D ClassificationDESAverage MAE0.021JL-DCF*
2D ClassificationDESS-Measure93.6JL-DCF*
2D ClassificationDESmax E-Measure97.5JL-DCF*
2D ClassificationDESmax F-Measure92.9JL-DCF*
2D Object DetectionNJU2KAverage MAE0.04JL-DCF*
2D Object DetectionNJU2KS-Measure91.1JL-DCF*
2D Object DetectionNJU2Kmax E-Measure94.8JL-DCF*
2D Object DetectionNJU2Kmax F-Measure91.3JL-DCF*
2D Object DetectionSTEREAverage MAE0.039JL-DCF*
2D Object DetectionSTERES-Measure91.1JL-DCF*
2D Object DetectionSTEREmax E-Measure94.9JL-DCF*
2D Object DetectionSTEREmax F-Measure90.7JL-DCF*
2D Object DetectionSIPAverage MAE0.046JL-DCF*
2D Object DetectionSIPS-Measure89.2JL-DCF*
2D Object DetectionSIPmax E-Measure94.9JL-DCF*
2D Object DetectionSIPmax F-Measure90JL-DCF*
2D Object DetectionNLPRAverage MAE0.023JL-DCF*
2D Object DetectionNLPRS-Measure92.6JL-DCF*
2D Object DetectionNLPRmax E-Measure96.4JL-DCF*
2D Object DetectionNLPRmax F-Measure91.7JL-DCF*
2D Object DetectionDESAverage MAE0.021JL-DCF*
2D Object DetectionDESS-Measure93.6JL-DCF*
2D Object DetectionDESmax E-Measure97.5JL-DCF*
2D Object DetectionDESmax F-Measure92.9JL-DCF*
16kNJU2KAverage MAE0.04JL-DCF*
16kNJU2KS-Measure91.1JL-DCF*
16kNJU2Kmax E-Measure94.8JL-DCF*
16kNJU2Kmax F-Measure91.3JL-DCF*
16kSTEREAverage MAE0.039JL-DCF*
16kSTERES-Measure91.1JL-DCF*
16kSTEREmax E-Measure94.9JL-DCF*
16kSTEREmax F-Measure90.7JL-DCF*
16kSIPAverage MAE0.046JL-DCF*
16kSIPS-Measure89.2JL-DCF*
16kSIPmax E-Measure94.9JL-DCF*
16kSIPmax F-Measure90JL-DCF*
16kNLPRAverage MAE0.023JL-DCF*
16kNLPRS-Measure92.6JL-DCF*
16kNLPRmax E-Measure96.4JL-DCF*
16kNLPRmax F-Measure91.7JL-DCF*
16kDESAverage MAE0.021JL-DCF*
16kDESS-Measure93.6JL-DCF*
16kDESmax E-Measure97.5JL-DCF*
16kDESmax F-Measure92.9JL-DCF*

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17