SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

2024-07-07Unsupervised Monocular Depth Estimation Autonomous Driving Camera Pose Estimation Depth Estimation Monocular Depth Estimation

Paper PDF Code(official)

Abstract

Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reconstruction and mislead the depth estimation networks with wrong supervisory signals. In this article, we introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. Specifically, a confidence-aware feature flow estimator is proposed to acquire 2D feature positional translations and their associated confidence levels. Meanwhile, we introduce a positional clue aggregator, which integrates pseudo 3D point clouds from DepthNet and 2D feature flows into homogeneous positional representations. Finally, a hierarchical positional embedding injector is proposed to selectively inject spatial clues into semantic features for robust camera pose decoding. Extensive experiments and analyses demonstrate the superior performance of our model compared to other state-of-the-art methods. Remarkably, SCIPaD achieves a reduction of 22.2\% in average translation error and 34.8\% in average angular error for camera pose estimation task on the KITTI Odometry dataset. Our source code is available at \url{https://mias.group/SCIPaD}.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25	0.918	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25^2	0.97	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25^3	0.985	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	RMSE	4.056	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	RMSE log	0.166	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	Sq Rel	0.65	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	absolute relative error	0.09	SCIPaD
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25	0.897	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25^2	0.964	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	Delta < 1.25^3	0.983	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	RMSE	4.391	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	RMSE log	0.175	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	Sq Rel	0.732	SCIPaD(M+640x192)
Depth Estimation	KITTI Eigen split unsupervised	absolute relative error	0.098	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	Delta < 1.25	0.918	SCIPaD
3D	KITTI Eigen split unsupervised	Delta < 1.25^2	0.97	SCIPaD
3D	KITTI Eigen split unsupervised	Delta < 1.25^3	0.985	SCIPaD
3D	KITTI Eigen split unsupervised	RMSE	4.056	SCIPaD
3D	KITTI Eigen split unsupervised	RMSE log	0.166	SCIPaD
3D	KITTI Eigen split unsupervised	Sq Rel	0.65	SCIPaD
3D	KITTI Eigen split unsupervised	absolute relative error	0.09	SCIPaD
3D	KITTI Eigen split unsupervised	Delta < 1.25	0.897	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	Delta < 1.25^2	0.964	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	Delta < 1.25^3	0.983	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	RMSE	4.391	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	RMSE log	0.175	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	Sq Rel	0.732	SCIPaD(M+640x192)
3D	KITTI Eigen split unsupervised	absolute relative error	0.098	SCIPaD(M+640x192)
Camera Pose Estimation	KITTI Odometry Benchmark	Absolute Trajectory Error [m]	20.83	SCIPaD
Camera Pose Estimation	KITTI Odometry Benchmark	Average Rotational Error er[%]	3.17	SCIPaD
Camera Pose Estimation	KITTI Odometry Benchmark	Average Translational Error et[%]	8.63	SCIPaD

SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

Abstract

Results

Related Papers

SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

Abstract

Results

Related Papers