VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Yin Zhou, Oncel Tuzel

2017-11-17CVPR 2018 6Feature Engineering Birds Eye View Object Detection Descriptive Region Proposal Object Localization 3D Object Detection Object Detection

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.

Results

Task	Dataset	Metric	Value	Model
Birds Eye View Object Detection	KITTI Pedestrian Moderate val	AP	61.05	VoxelNet
Birds Eye View Object Detection	KITTI Cars Hard val	AP	78.57	VoxelNet
Birds Eye View Object Detection	KITTI Cyclist Hard val	AP	50.49	VoxelNet
Birds Eye View Object Detection	KITTI Pedestrian Easy val	AP	65.95	VoxelNet
Birds Eye View Object Detection	KITTI Cyclist Moderate val	AP	52.18	VoxelNet
Birds Eye View Object Detection	KITTI Cars Hard	AP	77.39	VoxelNet
Birds Eye View Object Detection	KITTI Pedestrian Hard val	AP	56.98	VoxelNet
Birds Eye View Object Detection	KITTI Cars Moderate val	AP	84.81	VoxelNet
Birds Eye View Object Detection	KITTI Cars Easy val	AP	89.6	VoxelNet
Birds Eye View Object Detection	KITTI Cyclist Easy val	AP	74.41	VoxelNet

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Abstract

Results

Related Papers

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Abstract

Results

Related Papers