TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Single Stream Network for Robust and Real-time RGB-D Sal...

A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection

Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang

2020-07-14ECCV 2020 8Thermal Image SegmentationSalient Object DetectionRGB-D Salient Object Detectionobject-detectionObject DetectionRGB Salient Object Detection
PaperPDFCode(official)

Abstract

Existing RGB-D salient object detection (SOD) approaches concentrate on the cross-modal fusion between the RGB stream and the depth stream. They do not deeply explore the effect of the depth map itself. In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model. We tactfully utilize depth information from two perspectives: (1) Overcoming the incompatibility problem caused by the great difference between modalities, we build a single stream encoder to achieve the early fusion, which can take full advantage of ImageNet pre-trained backbone model to extract rich and discriminative features. (2) We design a novel depth-enhanced dual attention module (DEDA) to efficiently provide the fore-/back-ground branches with the spatially filtered features, which enables the decoder to optimally perform the middle fusion. Besides, we put forward a pyramidally attended feature extraction module (PAFE) to accurately localize the objects of different scales. Extensive experiments demonstrate that the proposed model performs favorably against most state-of-the-art methods under different evaluation metrics. Furthermore, this model is 55.5\% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 \times 384$ image.

Results

TaskDatasetMetricValueModel
Semantic SegmentationRGB-T-Glass-SegmentationMAE0.069DANet
Object DetectionNJU2KAverage MAE0.046DANet
Object DetectionNJU2KS-Measure89.7DANet
Object DetectionNJU2Kmax F-Measure90.5DANet
3DNJU2KAverage MAE0.046DANet
3DNJU2KS-Measure89.7DANet
3DNJU2Kmax F-Measure90.5DANet
2D ClassificationNJU2KAverage MAE0.046DANet
2D ClassificationNJU2KS-Measure89.7DANet
2D ClassificationNJU2Kmax F-Measure90.5DANet
Scene SegmentationRGB-T-Glass-SegmentationMAE0.069DANet
2D Object DetectionNJU2KAverage MAE0.046DANet
2D Object DetectionNJU2KS-Measure89.7DANet
2D Object DetectionNJU2Kmax F-Measure90.5DANet
2D Object DetectionRGB-T-Glass-SegmentationMAE0.069DANet
10-shot image generationRGB-T-Glass-SegmentationMAE0.069DANet
16kNJU2KAverage MAE0.046DANet
16kNJU2KS-Measure89.7DANet
16kNJU2Kmax F-Measure90.5DANet

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07