Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics

Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova

2019-06-12Motion Estimation Depth Estimation Depth And Camera Motion Monocular Depth Estimation

Abstract

We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion. More specifically, we model the motion of individual objects and learn their 3D motion vector jointly with depth and ego-motion. We obtain more accurate results, especially for challenging dynamic scenes not addressed by previous approaches. This is an extended version of Casser et al. [AAAI'19]. Code and models have been open sourced at https://sites.google.com/corp/view/struct2depth.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	KITTI Eigen split unsupervised	absolute relative error	0.1412	Struct2Depth M
3D	KITTI Eigen split unsupervised	absolute relative error	0.1412	Struct2Depth M

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17 $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15 Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15 Cameras as Relative Positional Encoding2025-07-14