TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RGB-D Saliency Detection via Cascaded Mutual Information M...

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

2021-09-15ICCV 2021 10Thermal Image SegmentationSaliency Detection
PaperPDFCode(official)

Abstract

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7x larger than NJU2K) dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at "https://github.com/JingZhang617/cascaded_rgbd_sod".

Results

TaskDatasetMetricValueModel
Semantic SegmentationRGB-T-Glass-SegmentationMAE0.041CLNet
Scene SegmentationRGB-T-Glass-SegmentationMAE0.041CLNet
2D Object DetectionRGB-T-Glass-SegmentationMAE0.041CLNet
10-shot image generationRGB-T-Glass-SegmentationMAE0.041CLNet

Related Papers

Feature Hallucination for Self-supervised Action Recognition2025-06-25Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation2025-05-11Low-Rate Semantic Communication with Codebook-based Conditional Generative Models2025-04-07Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization2025-03-22A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces2025-03-21Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance2025-03-04Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset2024-12-13Unlocking Comics: The AI4VA Dataset for Visual Understanding2024-10-27