TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3DVNet: Multi-View Depth Prediction and Volumetric Refinem...

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Alexander Rich, Noah Stier, Pradeep Sen, Tobias Höllerer

2021-12-013D Action RecognitionDepth PredictionPrediction3D ReconstructionDepth Estimation
PaperPDFCode(official)

Abstract

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
VideoNTU RGB+DCross View Accuracy96.33DV-PointNet++
Temporal Action LocalizationNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
Temporal Action LocalizationNTU RGB+DCross View Accuracy96.33DV-PointNet++
Zero-Shot LearningNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
Zero-Shot LearningNTU RGB+DCross View Accuracy96.33DV-PointNet++
Activity RecognitionNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
Activity RecognitionNTU RGB+DCross View Accuracy96.33DV-PointNet++
Action LocalizationNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
Action LocalizationNTU RGB+DCross View Accuracy96.33DV-PointNet++
3D Action RecognitionNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
3D Action RecognitionNTU RGB+DCross View Accuracy96.33DV-PointNet++
Action RecognitionNTU RGB+DCross Subject Accuracy88.83DV-PointNet++
Action RecognitionNTU RGB+DCross View Accuracy96.33DV-PointNet++

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16