TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FutureDepth: Learning to Predict the Future Improves Video...

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

2024-03-19Future predictionDepth EstimationMonocular Depth Estimation
PaperPDF

Abstract

In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame features one time step ahead iteratively. In this way, F-Net learns the underlying motion and correspondence information, and we incorporate its features into the depth decoding process. Additionally, to enrich the learning of multiframe correspondence cues, we further leverage a reconstruction network, R-Net, which is trained via adaptively masked auto-encoding of multiframe feature volumes. At inference time, both F-Net and R-Net are used to produce queries to work with the depth decoder, as well as a final refinement network. Through extensive experiments on several benchmarks, i.e., NYUDv2, KITTI, DDAD, and Sintel, which cover indoor, driving, and open-domain scenarios, we show that FutureDepth significantly improves upon baseline models, outperforms existing video depth estimation methods, and sets new state-of-the-art (SOTA) accuracy. Furthermore, FutureDepth is more efficient than existing SOTA video depth estimation models and has similar latencies when comparing to monocular models

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.981FutureDepth
Depth EstimationNYU-Depth V2Delta < 1.25^20.996FutureDepth
Depth EstimationNYU-Depth V2Delta < 1.25^30.999FutureDepth
Depth EstimationNYU-Depth V2RMSE0.233FutureDepth
Depth EstimationNYU-Depth V2absolute relative error0.063FutureDepth
Depth EstimationNYU-Depth V2log 100.027FutureDepth
Depth EstimationKITTI Eigen splitDelta < 1.250.984FutureDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^20.998FutureDepth
Depth EstimationKITTI Eigen splitDelta < 1.25^31FutureDepth
Depth EstimationKITTI Eigen splitRMSE1.856FutureDepth
Depth EstimationKITTI Eigen splitRMSE log0.066FutureDepth
Depth EstimationKITTI Eigen splitSq Rel0.117FutureDepth
Depth EstimationKITTI Eigen splitSquare relative error (SqRel)0.117FutureDepth
Depth EstimationKITTI Eigen splitabsolute relative error0.041FutureDepth
3DNYU-Depth V2Delta < 1.250.981FutureDepth
3DNYU-Depth V2Delta < 1.25^20.996FutureDepth
3DNYU-Depth V2Delta < 1.25^30.999FutureDepth
3DNYU-Depth V2RMSE0.233FutureDepth
3DNYU-Depth V2absolute relative error0.063FutureDepth
3DNYU-Depth V2log 100.027FutureDepth
3DKITTI Eigen splitDelta < 1.250.984FutureDepth
3DKITTI Eigen splitDelta < 1.25^20.998FutureDepth
3DKITTI Eigen splitDelta < 1.25^31FutureDepth
3DKITTI Eigen splitRMSE1.856FutureDepth
3DKITTI Eigen splitRMSE log0.066FutureDepth
3DKITTI Eigen splitSq Rel0.117FutureDepth
3DKITTI Eigen splitSquare relative error (SqRel)0.117FutureDepth
3DKITTI Eigen splitabsolute relative error0.041FutureDepth

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11