PanoDepth: A Two-Stage Approach for Monocular Omnidirectional Depth Estimation

Yuyan Li, Zhixin Yan, Ye Duan, Liu Ren

2022-02-02Stereo Matching Autonomous Driving Depth Estimation Vocal Bursts Valence Prediction Monocular Depth Estimation

Paper PDF

Abstract

Omnidirectional 3D information is essential for a wide range of applications such as Virtual Reality, Autonomous Driving, Robotics, etc. In this paper, we propose a novel, model-agnostic, two-stage pipeline for omnidirectional monocular depth estimation. Our proposed framework PanoDepth takes one 360 image as input, produces one or more synthesized views in the first stage, and feeds the original image and the synthesized images into the subsequent stereo matching stage. In the second stage, we propose a differentiable Spherical Warping Layer to handle omnidirectional stereo geometry efficiently and effectively. By utilizing the explicit stereo-based geometric constraints in the stereo matching stage, PanoDepth can generate dense high-quality depth. We conducted extensive experiments and ablation studies to evaluate PanoDepth with both the full pipeline as well as the individual modules in each stage. Our results show that PanoDepth outperforms the state-of-the-art approaches by a large margin for 360 monocular depth estimation.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	Stanford2D3D Panoramic	RMSE	0.3747	PanoDepth
Depth Estimation	Stanford2D3D Panoramic	absolute relative error	0.0972	PanoDepth
3D	Stanford2D3D Panoramic	RMSE	0.3747	PanoDepth
3D	Stanford2D3D Panoramic	absolute relative error	0.0972	PanoDepth

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19 AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18 $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17 Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17