TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Transferring to Real-World Layouts: A Depth-aware Framewor...

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

Mu Chen, Zhedong Zheng, Yi Yang

2023-11-21Scene SegmentationSynthetic-to-Real TranslationDepth EstimationUnsupervised Domain AdaptationDomain Adaptation
PaperPDFCode(official)Code

Abstract

Scene segmentation via unsupervised domain adaptation (UDA) enables the transfer of knowledge acquired from source synthetic data to real-world target data, which largely reduces the need for manual pixel-level annotations in the target domain. To facilitate domain-invariant feature learning, existing methods typically mix data from both the source domain and target domain by simply copying and pasting the pixels. Such vanilla methods are usually sub-optimal since they do not take into account how well the mixed layouts correspond to real-world scenarios. Real-world scenarios are with an inherent layout. We observe that semantic categories, such as sidewalks, buildings, and sky, display relatively consistent depth distributions, and could be clearly distinguished in a depth map. Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i.e., segmentation and depth learning in an end-to-end manner. In particular, the framework contains a Depth-guided Contextual Filter (DCF) forndata augmentation and a cross-task encoder for contextual learning. DCF simulates the real-world layouts, while the cross-task encoder further adaptively fuses the complementing features between two tasks. Besides, it is worth noting that several public datasets do not provide depth annotation. Therefore, we leverage the off-the-shelf depth estimation network to generate the pseudo depth. Extensive experiments show that our proposed methods, even with pseudo depth, achieve competitive performance on two widely-used bench-marks, i.e. 77.7 mIoU on GTA to Cityscapes and 69.3 mIoU on Synthia to Cityscapes.

Results

TaskDatasetMetricValueModel
Image-to-Image TranslationGTAV-to-Cityscapes LabelsmIoU77.7DCF
Image-to-Image TranslationSYNTHIA-to-CityscapesMIoU (13 classes)75.9DCF
Image-to-Image TranslationSYNTHIA-to-CityscapesMIoU (16 classes)69.3DCF
Domain AdaptationSYNTHIA-to-CityscapesmIoU69.3DCF
Domain AdaptationGTA5 to CityscapesmIoU77.7DCF
Domain AdaptationSYNTHIA-to-CityscapesMIoU (16 classes)69.3DCF
Domain AdaptationSYNTHIA-to-CityscapesmIoU69.3DCF
Domain AdaptationSYNTHIA-to-CityscapesmIoU (13 classes)75.9DCF
Image GenerationGTAV-to-Cityscapes LabelsmIoU77.7DCF
Image GenerationSYNTHIA-to-CityscapesMIoU (13 classes)75.9DCF
Image GenerationSYNTHIA-to-CityscapesMIoU (16 classes)69.3DCF
Unsupervised Domain AdaptationSYNTHIA-to-CityscapesMIoU (16 classes)69.3DCF
Unsupervised Domain AdaptationSYNTHIA-to-CityscapesmIoU69.3DCF
Unsupervised Domain AdaptationSYNTHIA-to-CityscapesmIoU (13 classes)75.9DCF
1 Image, 2*2 StitchingGTAV-to-Cityscapes LabelsmIoU77.7DCF
1 Image, 2*2 StitchingSYNTHIA-to-CityscapesMIoU (13 classes)75.9DCF
1 Image, 2*2 StitchingSYNTHIA-to-CityscapesMIoU (16 classes)69.3DCF

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14