TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NVDS+: Towards Efficient and Versatile Neural Stabilizer f...

NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

Yiran Wang, Min Shi, Jiaqi Li, Chaoyi Hong, Zihao Huang, Juewen Peng, Zhiguo Cao, Jianming Zhang, Ke Xian, Guosheng Lin

2023-07-17ICCV 2023 1Novel View SynthesisSemantic Segmentation3D ReconstructionDepth EstimationVideo Semantic SegmentationMonocular Depth Estimation
PaperPDFCodeCode(official)

Abstract

Video depth estimation aims to infer temporally consistent depth. One approach is to finetune a single-image model on each video with geometry constraints, which proves inefficient and lacks robustness. An alternative is learning to enforce consistency from data, which requires well-designed models and sufficient video depth data. To address both challenges, we introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner. We also elaborate a large-scale Video Depth in the Wild (VDW) dataset, which contains 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Additionally, a bidirectional inference strategy is designed to improve consistency by adaptively fusing forward and backward predictions. We instantiate a model family ranging from small to large scales for different applications. The method is evaluated on VDW dataset and three public benchmarks. To further prove the versatility, we extend NVDS+ to video semantic segmentation and several downstream applications like bokeh rendering, novel view synthesis, and 3D reconstruction. Experimental results show that our method achieves significant improvements in consistency, accuracy, and efficiency. Our work serves as a solid baseline and data foundation for learning-based video depth estimation. Code and dataset are available at: https://github.com/RaymondWang987/NVDS

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.9493NVDS(DPT-L)
Depth EstimationNYU-Depth V2Delta < 1.25^20.991NVDS(DPT-L)
Depth EstimationNYU-Depth V2Delta < 1.25^30.997NVDS(DPT-L)
Depth EstimationNYU-Depth V2RMSE0.282NVDS(DPT-L)
Depth EstimationNYU-Depth V2absolute relative error0.072NVDS(DPT-L)
Depth EstimationNYU-Depth V2log 100.031NVDS(DPT-L)
3DNYU-Depth V2Delta < 1.250.9493NVDS(DPT-L)
3DNYU-Depth V2Delta < 1.25^20.991NVDS(DPT-L)
3DNYU-Depth V2Delta < 1.25^30.997NVDS(DPT-L)
3DNYU-Depth V2RMSE0.282NVDS(DPT-L)
3DNYU-Depth V2absolute relative error0.072NVDS(DPT-L)
3DNYU-Depth V2log 100.031NVDS(DPT-L)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17