TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SwinMTL: A Shared Architecture for Simultaneous Depth Esti...

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

Pardis Taghavi, Reza Langari, Gaurav Pandey

2024-03-15Real-Time Semantic SegmentationSegmentationSemantic SegmentationMulti-Task LearningDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera. The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency. Additionally, the paper incorporates an adversarial training component, employing a Wasserstein GAN framework with a critic network, to refine model's predictions. The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks. We also conducted ablation studies to analyze the contributions of different components, including pre-training strategies, the inclusion of critics, the use of logarithmic depth scaling, and advanced image augmentations, to provide a better understanding of the proposed framework. The accompanying source code is accessible at \url{https://github.com/PardisTaghavi/SwinMTL}.

Results

TaskDatasetMetricValueModel
Depth EstimationCityscapes testRMSE6.352SwinMTL
Depth EstimationCityscapesAbsolute relative error (AbsRel)0.089SwinMTL
Depth EstimationCityscapesRMSE5.481SwinMTL
Depth EstimationCityscapesRMSE log0.139SwinMTL
Depth EstimationCityscapesSquare relative error (SqRel)1.051SwinMTL
Transfer LearningNYUv2Mean IoU58.14SwinMTL
Transfer LearningCityscapes testRMSE0.51SwinMTL
Transfer LearningCityscapes testmIoU76.41SwinMTL
Semantic SegmentationCityscapes valmIoU76.41SwinMTL
3DCityscapes testRMSE6.352SwinMTL
3DCityscapesAbsolute relative error (AbsRel)0.089SwinMTL
3DCityscapesRMSE5.481SwinMTL
3DCityscapesRMSE log0.139SwinMTL
3DCityscapesSquare relative error (SqRel)1.051SwinMTL
Multi-Task LearningNYUv2Mean IoU58.14SwinMTL
Multi-Task LearningCityscapes testRMSE0.51SwinMTL
Multi-Task LearningCityscapes testmIoU76.41SwinMTL
10-shot image generationCityscapes valmIoU76.41SwinMTL

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17