Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Tianwei Shen, Zixin Luo, Lei Zhou, Hanyu Deng, Runze Zhang, Tian Fang, Long Quan

2019-02-25Visual Odometry Motion Estimation Self-Supervised Learning Camera Pose Estimation Simultaneous Localization and Mapping

Paper PDF Code(official)

Abstract

Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization and mapping (SLAM). Recently, the self-supervised learning framework that jointly optimizes the relative pose and target image depth has attracted the attention of the community. Previous works rely on the photometric error generated from depths and poses between adjacent frames, which contains large systematic error under realistic scenes due to reflective surfaces and occlusions. In this paper, we bridge the gap between geometric loss and photometric loss by introducing the matching loss constrained by epipolar geometry in a self-supervised framework. Evaluated on the KITTI dataset, our method outperforms the state-of-the-art unsupervised ego-motion estimation methods by a large margin. The code and data are available at https://github.com/hlzz/DeepMatchVO.

Results

Task	Dataset	Metric	Value	Model
Camera Pose Estimation	KITTI Odometry Benchmark	Absolute Trajectory Error [m]	25.76	DeepMatchVO
Camera Pose Estimation	KITTI Odometry Benchmark	Average Rotational Error er[%]	4.85	DeepMatchVO
Camera Pose Estimation	KITTI Odometry Benchmark	Average Translational Error et[%]	11.05	DeepMatchVO

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17 A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16 SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16 BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16 Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14 Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures2025-07-14