TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DFormer: Rethinking RGBD Representation Learning for Seman...

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou

2023-09-183D geometryRepresentation LearningSegmentationSemantic SegmentationSalient Object DetectionRGB-D Salient Object Detectionobject-detectionObject Detection
PaperPDFCode

Abstract

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that encode RGB-D information with RGB pretrained backbone, we pretrain the backbone using image-depth pairs from ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D representations; 2) DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design. DFormer avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pretrained backbones, which widely lies in existing methods but has not been resolved. We finetune the pretrained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D semantic segmentation datasets and five RGB-D salient object detection datasets. Our code is available at: https://github.com/VCIP-RGBD/DFormer.

Results

TaskDatasetMetricValueModel
Semantic SegmentationSYN-UDTIRIIoU90.88DFormer
Object DetectionNJU2KAverage MAE0.023DFormer-L
Object DetectionNJU2KS-Measure93.7DFormer-L
Object DetectionNJU2Kmax E-Measure96.4DFormer-L
Object DetectionNJU2Kmax F-Measure94.6DFormer-L
Object DetectionSTEREAverage MAE0.03DFormer-L
Object DetectionSTERES-Measure92.3DFormer-L
Object DetectionSTEREmax E-Measure95.2DFormer-L
Object DetectionSTEREmax F-Measure92.9DFormer-L
Object DetectionSIPAverage MAE0.032DFormer-L
Object DetectionSIPS-Measure91.5DFormer-L
Object DetectionSIPmax E-Measure95DFormer-L
Object DetectionSIPmax F-Measure93.8DFormer-L
Object DetectionNLPRAverage MAE0.016DFormer-L
Object DetectionNLPRS-Measure94.2DFormer-L
Object DetectionNLPRmax E-Measure97.1DFormer-L
Object DetectionNLPRmax F-Measure93.9DFormer-L
Object DetectionDESAverage MAE0.013DFormer-L
Object DetectionDESS-Measure94.8DFormer-L
Object DetectionDESmax E-Measure98DFormer-L
Object DetectionDESmax F-Measure95.6DFormer-L
3DNJU2KAverage MAE0.023DFormer-L
3DNJU2KS-Measure93.7DFormer-L
3DNJU2Kmax E-Measure96.4DFormer-L
3DNJU2Kmax F-Measure94.6DFormer-L
3DSTEREAverage MAE0.03DFormer-L
3DSTERES-Measure92.3DFormer-L
3DSTEREmax E-Measure95.2DFormer-L
3DSTEREmax F-Measure92.9DFormer-L
3DSIPAverage MAE0.032DFormer-L
3DSIPS-Measure91.5DFormer-L
3DSIPmax E-Measure95DFormer-L
3DSIPmax F-Measure93.8DFormer-L
3DNLPRAverage MAE0.016DFormer-L
3DNLPRS-Measure94.2DFormer-L
3DNLPRmax E-Measure97.1DFormer-L
3DNLPRmax F-Measure93.9DFormer-L
3DDESAverage MAE0.013DFormer-L
3DDESS-Measure94.8DFormer-L
3DDESmax E-Measure98DFormer-L
3DDESmax F-Measure95.6DFormer-L
2D ClassificationNJU2KAverage MAE0.023DFormer-L
2D ClassificationNJU2KS-Measure93.7DFormer-L
2D ClassificationNJU2Kmax E-Measure96.4DFormer-L
2D ClassificationNJU2Kmax F-Measure94.6DFormer-L
2D ClassificationSTEREAverage MAE0.03DFormer-L
2D ClassificationSTERES-Measure92.3DFormer-L
2D ClassificationSTEREmax E-Measure95.2DFormer-L
2D ClassificationSTEREmax F-Measure92.9DFormer-L
2D ClassificationSIPAverage MAE0.032DFormer-L
2D ClassificationSIPS-Measure91.5DFormer-L
2D ClassificationSIPmax E-Measure95DFormer-L
2D ClassificationSIPmax F-Measure93.8DFormer-L
2D ClassificationNLPRAverage MAE0.016DFormer-L
2D ClassificationNLPRS-Measure94.2DFormer-L
2D ClassificationNLPRmax E-Measure97.1DFormer-L
2D ClassificationNLPRmax F-Measure93.9DFormer-L
2D ClassificationDESAverage MAE0.013DFormer-L
2D ClassificationDESS-Measure94.8DFormer-L
2D ClassificationDESmax E-Measure98DFormer-L
2D ClassificationDESmax F-Measure95.6DFormer-L
2D Object DetectionNJU2KAverage MAE0.023DFormer-L
2D Object DetectionNJU2KS-Measure93.7DFormer-L
2D Object DetectionNJU2Kmax E-Measure96.4DFormer-L
2D Object DetectionNJU2Kmax F-Measure94.6DFormer-L
2D Object DetectionSTEREAverage MAE0.03DFormer-L
2D Object DetectionSTERES-Measure92.3DFormer-L
2D Object DetectionSTEREmax E-Measure95.2DFormer-L
2D Object DetectionSTEREmax F-Measure92.9DFormer-L
2D Object DetectionSIPAverage MAE0.032DFormer-L
2D Object DetectionSIPS-Measure91.5DFormer-L
2D Object DetectionSIPmax E-Measure95DFormer-L
2D Object DetectionSIPmax F-Measure93.8DFormer-L
2D Object DetectionNLPRAverage MAE0.016DFormer-L
2D Object DetectionNLPRS-Measure94.2DFormer-L
2D Object DetectionNLPRmax E-Measure97.1DFormer-L
2D Object DetectionNLPRmax F-Measure93.9DFormer-L
2D Object DetectionDESAverage MAE0.013DFormer-L
2D Object DetectionDESS-Measure94.8DFormer-L
2D Object DetectionDESmax E-Measure98DFormer-L
2D Object DetectionDESmax F-Measure95.6DFormer-L
10-shot image generationSYN-UDTIRIIoU90.88DFormer
16kNJU2KAverage MAE0.023DFormer-L
16kNJU2KS-Measure93.7DFormer-L
16kNJU2Kmax E-Measure96.4DFormer-L
16kNJU2Kmax F-Measure94.6DFormer-L
16kSTEREAverage MAE0.03DFormer-L
16kSTERES-Measure92.3DFormer-L
16kSTEREmax E-Measure95.2DFormer-L
16kSTEREmax F-Measure92.9DFormer-L
16kSIPAverage MAE0.032DFormer-L
16kSIPS-Measure91.5DFormer-L
16kSIPmax E-Measure95DFormer-L
16kSIPmax F-Measure93.8DFormer-L
16kNLPRAverage MAE0.016DFormer-L
16kNLPRS-Measure94.2DFormer-L
16kNLPRmax E-Measure97.1DFormer-L
16kNLPRmax F-Measure93.9DFormer-L
16kDESAverage MAE0.013DFormer-L
16kDESS-Measure94.8DFormer-L
16kDESmax E-Measure98DFormer-L
16kDESmax F-Measure95.6DFormer-L

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17