TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SPIdepth: Strengthened Pose Information for Self-supervise...

SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation

Mykola Lavreniuk

2024-04-18Unsupervised Monocular Depth EstimationScene UnderstandingAutonomous DrivingDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Self-supervised monocular depth estimation has garnered considerable attention for its applications in autonomous driving and robotics. While recent methods have made strides in leveraging techniques like the Self Query Layer (SQL) to infer depth from motion, they often overlook the potential of strengthening pose information. In this paper, we introduce SPIdepth, a novel approach that prioritizes enhancing the pose network for improved depth estimation. Building upon the foundation laid by SQL, SPIdepth emphasizes the importance of pose information in capturing fine-grained scene structures. By enhancing the pose network's capabilities, SPIdepth achieves remarkable advancements in scene understanding and depth estimation. Experimental results on benchmark datasets such as KITTI, Cityscapes, and Make3D showcase SPIdepth's state-of-the-art performance, surpassing previous methods by significant margins. Specifically, SPIdepth tops the self-supervised KITTI benchmark. Additionally, SPIdepth achieves the lowest AbsRel (0.029), SqRel (0.069), and RMSE (1.394) on KITTI, establishing new state-of-the-art results. On Cityscapes, SPIdepth shows improvements over SQLdepth of 21.7% in AbsRel, 36.8% in SqRel, and 16.5% in RMSE, even without using motion masks. On Make3D, SPIdepth in zero-shot outperforms all other models. Remarkably, SPIdepth achieves these results using only a single image for inference, surpassing even methods that utilize video sequences for inference, thus demonstrating its efficacy and efficiency in real-world applications. Our approach represents a significant leap forward in self-supervised monocular depth estimation, underscoring the importance of strengthening pose information for advancing scene understanding in real-world applications. The code and pre-trained models are publicly available at https://github.com/Lavreniuk/SPIdepth.

Results

TaskDatasetMetricValueModel
Depth EstimationKITTI Eigen splitDelta < 1.250.99SPIDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^20.999SPIDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^31SPIDepth
Depth EstimationKITTI Eigen splitRMSE1.394SPIDepth
Depth EstimationKITTI Eigen splitRMSE log0.048SPIDepth
Depth EstimationKITTI Eigen splitSq Rel0.069SPIDepth
Depth EstimationKITTI Eigen splitabsolute relative error0.029SPIDepth
Depth EstimationMake3DAbs Rel0.299SPIDepth
Depth EstimationMake3DRMSE6.672SPIDepth
Depth EstimationMake3DSq Rel1.931SPIDepth
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.250.94SPIdepth
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^20.973SPIdepth
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^30.985SPIdepth
Depth EstimationKITTI Eigen split unsupervisedRMSE3.662SPIdepth
Depth EstimationKITTI Eigen split unsupervisedRMSE log0.153SPIdepth
Depth EstimationKITTI Eigen split unsupervisedSq Rel0.531SPIdepth
Depth EstimationKITTI Eigen split unsupervisedTest frames1SPIdepth
Depth EstimationKITTI Eigen split unsupervisedabsolute relative error0.071SPIdepth
3DKITTI Eigen splitDelta < 1.250.99SPIDepth
3DKITTI Eigen splitDelta < 1.25^20.999SPIDepth
3DKITTI Eigen splitDelta < 1.25^31SPIDepth
3DKITTI Eigen splitRMSE1.394SPIDepth
3DKITTI Eigen splitRMSE log0.048SPIDepth
3DKITTI Eigen splitSq Rel0.069SPIDepth
3DKITTI Eigen splitabsolute relative error0.029SPIDepth
3DMake3DAbs Rel0.299SPIDepth
3DMake3DRMSE6.672SPIDepth
3DMake3DSq Rel1.931SPIDepth
3DKITTI Eigen split unsupervisedDelta < 1.250.94SPIdepth
3DKITTI Eigen split unsupervisedDelta < 1.25^20.973SPIdepth
3DKITTI Eigen split unsupervisedDelta < 1.25^30.985SPIdepth
3DKITTI Eigen split unsupervisedRMSE3.662SPIdepth
3DKITTI Eigen split unsupervisedRMSE log0.153SPIdepth
3DKITTI Eigen split unsupervisedSq Rel0.531SPIdepth
3DKITTI Eigen split unsupervisedTest frames1SPIdepth
3DKITTI Eigen split unsupervisedabsolute relative error0.071SPIdepth

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17