3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Alexander Rich, Noah Stier, Pradeep Sen, Tobias Höllerer

2021-12-013D Action Recognition Depth Prediction Prediction 3D Reconstruction Depth Estimation

Abstract

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

Results

Task	Dataset	Metric	Value	Model
Video	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Video	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
Temporal Action Localization	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Temporal Action Localization	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
Zero-Shot Learning	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Zero-Shot Learning	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
Activity Recognition	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Activity Recognition	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
Action Localization	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Action Localization	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
3D Action Recognition	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
3D Action Recognition	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++
Action Recognition	NTU RGB+D	Cross Subject Accuracy	88.8	3DV-PointNet++
Action Recognition	NTU RGB+D	Cross View Accuracy	96.3	3DV-PointNet++

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Abstract

Results

Related Papers

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

Abstract

Results

Related Papers