TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SCIPaD: Incorporating Spatial Clues into Unsupervised Pose...

SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

2024-07-07Unsupervised Monocular Depth EstimationAutonomous DrivingCamera Pose EstimationDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reconstruction and mislead the depth estimation networks with wrong supervisory signals. In this article, we introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. Specifically, a confidence-aware feature flow estimator is proposed to acquire 2D feature positional translations and their associated confidence levels. Meanwhile, we introduce a positional clue aggregator, which integrates pseudo 3D point clouds from DepthNet and 2D feature flows into homogeneous positional representations. Finally, a hierarchical positional embedding injector is proposed to selectively inject spatial clues into semantic features for robust camera pose decoding. Extensive experiments and analyses demonstrate the superior performance of our model compared to other state-of-the-art methods. Remarkably, SCIPaD achieves a reduction of 22.2\% in average translation error and 34.8\% in average angular error for camera pose estimation task on the KITTI Odometry dataset. Our source code is available at \url{https://mias.group/SCIPaD}.

Results

TaskDatasetMetricValueModel
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.250.918SCIPaD
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^20.97SCIPaD
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^30.985SCIPaD
Depth EstimationKITTI Eigen split unsupervisedRMSE4.056SCIPaD
Depth EstimationKITTI Eigen split unsupervisedRMSE log0.166SCIPaD
Depth EstimationKITTI Eigen split unsupervisedSq Rel0.65SCIPaD
Depth EstimationKITTI Eigen split unsupervisedabsolute relative error0.09SCIPaD
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.250.897SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^20.964SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedDelta < 1.25^30.983SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedRMSE4.391SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedRMSE log0.175SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedSq Rel0.732SCIPaD(M+640x192)
Depth EstimationKITTI Eigen split unsupervisedabsolute relative error0.098SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedDelta < 1.250.918SCIPaD
3DKITTI Eigen split unsupervisedDelta < 1.25^20.97SCIPaD
3DKITTI Eigen split unsupervisedDelta < 1.25^30.985SCIPaD
3DKITTI Eigen split unsupervisedRMSE4.056SCIPaD
3DKITTI Eigen split unsupervisedRMSE log0.166SCIPaD
3DKITTI Eigen split unsupervisedSq Rel0.65SCIPaD
3DKITTI Eigen split unsupervisedabsolute relative error0.09SCIPaD
3DKITTI Eigen split unsupervisedDelta < 1.250.897SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedDelta < 1.25^20.964SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedDelta < 1.25^30.983SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedRMSE4.391SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedRMSE log0.175SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedSq Rel0.732SCIPaD(M+640x192)
3DKITTI Eigen split unsupervisedabsolute relative error0.098SCIPaD(M+640x192)
Camera Pose EstimationKITTI Odometry BenchmarkAbsolute Trajectory Error [m]20.83SCIPaD
Camera Pose EstimationKITTI Odometry BenchmarkAverage Rotational Error er[%]3.17SCIPaD
Camera Pose EstimationKITTI Odometry BenchmarkAverage Translational Error et[%]8.63SCIPaD

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17