TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Voxel RoI Pooling

Voxel RoI Pooling

Computer VisionIntroduced 20005 papers
Source Paper

Description

Voxel RoI Pooling is a RoI feature extractor extracts RoI features directly from voxel features for further refinement. It starts by dividing a region proposal into G×G×GG \times G \times GG×G×G regular sub-voxels. The center point is taken as the grid point of the corresponding sub-voxel. Since 3D3 D3D feature volumes are extremely sparse (non-empty voxels account for <3%<3 \%<3% spaces), we cannot directly utilize max pooling over features of each sub-voxel. Instead, features are integrated from neighboring voxels into the grid points for feature extraction. Specifically, given a grid point g_ig\_{i}g_i, we first exploit voxel query to group a set of neighboring voxels \Gamma\_{i}=\left\(\mathbf{v}\_{i}^{1}, \mathbf{v}\_{i}^{2}, \cdots, \mathbf{v}\_{i}^{K}\right\) . Then, we aggregate the neighboring voxel features with a PointNet module a\mathrm{a}a as:

\mathbf{\eta}\_{i}=\max _{k=1,2, \cdots, K}\left\(\Psi\left(\left[\mathbf{v}\_{i}^{k}-\mathbf{g}\_{i} ; \mathbf{\phi}\_{i}^{k}\right]\right)\right\)

where v_i−g_i\mathbf{v}\_{i}-\mathbf{g}\_{i}v_i−g_i represents the relative coordinates, ϕ_ik\mathbf{\phi}\_{i}^{k}ϕ_ik is the voxel feature of v_ik\mathbf{v}\_{i}^{k}v_ik, and Ψ(⋅)\Psi(\cdot)Ψ(⋅) indicates an MLP. The max pooling operation max⁡(⋅)\max (\cdot)max(⋅) is performed along the channels to obtain the aggregated feature vector ηi.\eta_{i} .ηi​. Particularly, Voxel RoI pooling is exploited to extract voxel features from the 3D feature volumes out of the last two stages in the 3D3 \mathrm{D}3D backbone network. And for each stage, two Manhattan distance thresholds are set to group voxels with multiple scales. Then, we concatenate the aggregated features pooled from different stages and scales to obtain the RoI features.

Papers Using This Method

VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection2024-01-05Reviewing 3D Object Detectors in the Context of High-Resolution 3+1D Radar2023-08-10Cost-Aware Evaluation and Model Scaling for LiDAR-Based 3D Object Detection2022-05-02From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection2021-07-30Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection2020-12-31