TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Distill Any Depth: Distillation Creates a Stronger Monocul...

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang

2025-02-26Scene UnderstandingDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Recent advances in zero-shot monocular depth estimation(MDE) have significantly improved generalization by unifying depth distributions through normalized depth representations and by leveraging large-scale unlabeled data via pseudo-label distillation. However, existing methods that rely on global depth normalization treat all depth values equally, which can amplify noise in pseudo-labels and reduce distillation effectiveness. In this paper, we present a systematic analysis of depth normalization strategies in the context of pseudo-label distillation. Our study shows that, under recent distillation paradigms (e.g., shared-context distillation), normalization is not always necessary, as omitting it can help mitigate the impact of noisy supervision. Furthermore, rather than focusing solely on how depth information is represented, we propose Cross-Context Distillation, which integrates both global and local depth cues to enhance pseudo-label quality. We also introduce an assistant-guided distillation strategy that incorporates complementary depth priors from a diffusion-based teacher model, enhancing supervision diversity and robustness. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.

Results

TaskDatasetMetricValueModel
Depth EstimationScanNetV2Delta < 1.250.98Distill Any Depth
Depth EstimationScanNetV2absolute relative error0.042Distill Any Depth
Depth EstimationNYU-Depth V2Delta < 1.250.981Distill Any Depth
Depth EstimationNYU-Depth V2absolute relative error0.043Distill Any Depth
Depth EstimationETH3DDelta < 1.250.981Distill Any Depth
Depth EstimationETH3Dabsolute relative error0.054Distill Any Depth
3DScanNetV2Delta < 1.250.98Distill Any Depth
3DScanNetV2absolute relative error0.042Distill Any Depth
3DNYU-Depth V2Delta < 1.250.981Distill Any Depth
3DNYU-Depth V2absolute relative error0.043Distill Any Depth
3DETH3DDelta < 1.250.981Distill Any Depth
3DETH3Dabsolute relative error0.054Distill Any Depth

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation2025-07-15