TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Better Generalization: Joint Depth-Pose Learning w...

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu

2020-04-03CVPR 2020 6Visual OdometryOptical Flow EstimationSelf-Supervised LearningDepth PredictionPose EstimationDepth EstimationMonocular Depth Estimation
PaperPDFCodeCodeCode(official)Code

Abstract

In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2 self-supervisedAbsolute relative error (AbsRel)0.189Zhao et al
Depth EstimationNYU-Depth V2 self-supervisedRoot mean square error (RMSE)0.686Zhao et al
Depth EstimationNYU-Depth V2 self-superviseddelta_170.1Zhao et al
Depth EstimationNYU-Depth V2 self-superviseddelta_291.2Zhao et al
Depth EstimationNYU-Depth V2 self-superviseddelta_397.8Zhao et al
3DNYU-Depth V2 self-supervisedAbsolute relative error (AbsRel)0.189Zhao et al
3DNYU-Depth V2 self-supervisedRoot mean square error (RMSE)0.686Zhao et al
3DNYU-Depth V2 self-superviseddelta_170.1Zhao et al
3DNYU-Depth V2 self-superviseddelta_291.2Zhao et al
3DNYU-Depth V2 self-superviseddelta_397.8Zhao et al

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17