Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics
Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova
Abstract
We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion. More specifically, we model the motion of individual objects and learn their 3D motion vector jointly with depth and ego-motion. We obtain more accurate results, especially for challenging dynamic scenes not addressed by previous approaches. This is an extended version of Casser et al. [AAAI'19]. Code and models have been open sourced at https://sites.google.com/corp/view/struct2depth.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Depth Estimation | KITTI Eigen split unsupervised | absolute relative error | 0.1412 | Struct2Depth M |
| 3D | KITTI Eigen split unsupervised | absolute relative error | 0.1412 | Struct2Depth M |
Related Papers
DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14