TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MonoIndoor: Towards Good Practice of Self-Supervised Monoc...

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

Pan Ji, Runze Li, Bir Bhanu, Yi Xu

2021-07-26ICCV 2021 10Pose EstimationDepth EstimationMonocular Depth Estimation
PaperPDF

Abstract

Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart in at least the following two aspects: (i) the depth range of indoor sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues, whereas the maximum distance in outdoor scenes mostly stays the same as the camera usually sees the sky; (ii) the indoor sequences contain much more rotational motions, which cause difficulties for the pose network, while the motions of outdoor sequences are pre-dominantly translational, especially for driving datasets such as KITTI. In this paper, special considerations are given to those challenges and a set of good practices are consolidated for improving the performance of self-supervised monocular depth estimation in indoor environments. The proposed method mainly consists of two novel modules, \ie, a depth factorization module and a residual pose estimation module, each of which is designed to respectively tackle the aforementioned challenges. The effectiveness of each module is shown through a carefully conducted ablation study and the demonstration of the state-of-the-art performance on three indoor datasets, \ie, EuRoC, NYUv2, and 7-scenes.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2 self-supervisedAbsolute relative error (AbsRel)0.134MonoIndoor
Depth EstimationNYU-Depth V2 self-supervisedRoot mean square error (RMSE)0.526MonoIndoor
Depth EstimationNYU-Depth V2 self-superviseddelta_182.3MonoIndoor
Depth EstimationNYU-Depth V2 self-superviseddelta_295.8MonoIndoor
Depth EstimationNYU-Depth V2 self-superviseddelta_398.9MonoIndoor
3DNYU-Depth V2 self-supervisedAbsolute relative error (AbsRel)0.134MonoIndoor
3DNYU-Depth V2 self-supervisedRoot mean square error (RMSE)0.526MonoIndoor
3DNYU-Depth V2 self-superviseddelta_182.3MonoIndoor
3DNYU-Depth V2 self-superviseddelta_295.8MonoIndoor
3DNYU-Depth V2 self-superviseddelta_398.9MonoIndoor

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16