RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

2021-09-15ICCV 2021 10Thermal Image Segmentation Saliency Detection

Abstract

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7x larger than NJU2K) dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at "https://github.com/JingZhang617/cascaded_rgbd_sod".

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	RGB-T-Glass-Segmentation	MAE	0.041	CLNet
Scene Segmentation	RGB-T-Glass-Segmentation	MAE	0.041	CLNet
2D Object Detection	RGB-T-Glass-Segmentation	MAE	0.041	CLNet
10-shot image generation	RGB-T-Glass-Segmentation	MAE	0.041	CLNet

Related Papers

Feature Hallucination for Self-supervised Action Recognition2025-06-25 Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation2025-05-11 Low-Rate Semantic Communication with Codebook-based Conditional Generative Models2025-04-07 Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization2025-03-22 A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces2025-03-21 Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance2025-03-04 Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset2024-12-13 Unlocking Comics: The AI4VA Dataset for Visual Understanding2024-10-27