DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, LiWei Wang

2023-01-15CVPR 2023 1object-detection 3D Object Detection Object Detection

Paper PDF Code(official)Code Code Code(official)

Abstract

Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D perception. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points. In this paper, we present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. In order to efficiently process sparse points in parallel, we propose Dynamic Sparse Window Attention, which partitions a series of local regions in each window according to its sparsity and then computes the features of all regions in a fully parallel manner. To allow the cross-set connection, we design a rotated set partitioning strategy that alternates between two partitioning configurations in consecutive self-attention layers. To support effective downsampling and better encode geometric information, we also propose an attention-style 3D pooling module on sparse points, which is powerful and deployment-friendly without utilizing any customized CUDA operations. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks. More importantly, DSVT can be easily deployed by TensorRT with real-time inference speed (27Hz). Code will be available at \url{https://github.com/Haiyang-W/DSVT}.

Results

Task	Dataset	Metric	Value	Model
Object Detection	nuScenes LiDAR only	NDS	72.7	DSVT
Object Detection	nuScenes LiDAR only	NDS (val)	71.1	DSVT
Object Detection	nuScenes LiDAR only	mAP	68.4	DSVT
Object Detection	nuScenes LiDAR only	mAP (val)	66.4	DSVT
Object Detection	nuScenes	NDS	0.73	DSVT
Object Detection	nuScenes	mAAE	0.14	DSVT
Object Detection	nuScenes	mAOE	0.3	DSVT
Object Detection	nuScenes	mASE	0.23	DSVT
Object Detection	nuScenes	mATE	0.25	DSVT
Object Detection	nuScenes	mAVE	0.25	DSVT
Object Detection	Waymo Open Dataset	mAPH/L2	72.1	DSVT
Object Detection	waymo cyclist	APH/L2	78	DSVT(val)
Object Detection	waymo vehicle	APH/L2	74.1	DSVT(val)
Object Detection	waymo vehicle	L1 mAP	82.1	DSVT(val)
Object Detection	waymo pedestrian	APH/L2	76.4	DSVT(val)
3D	nuScenes LiDAR only	NDS	72.7	DSVT
3D	nuScenes LiDAR only	NDS (val)	71.1	DSVT
3D	nuScenes LiDAR only	mAP	68.4	DSVT
3D	nuScenes LiDAR only	mAP (val)	66.4	DSVT
3D	nuScenes	NDS	0.73	DSVT
3D	nuScenes	mAAE	0.14	DSVT
3D	nuScenes	mAOE	0.3	DSVT
3D	nuScenes	mASE	0.23	DSVT
3D	nuScenes	mATE	0.25	DSVT
3D	nuScenes	mAVE	0.25	DSVT
3D	Waymo Open Dataset	mAPH/L2	72.1	DSVT
3D	waymo cyclist	APH/L2	78	DSVT(val)
3D	waymo vehicle	APH/L2	74.1	DSVT(val)
3D	waymo vehicle	L1 mAP	82.1	DSVT(val)
3D	waymo pedestrian	APH/L2	76.4	DSVT(val)
3D Object Detection	nuScenes LiDAR only	NDS	72.7	DSVT
3D Object Detection	nuScenes LiDAR only	NDS (val)	71.1	DSVT
3D Object Detection	nuScenes LiDAR only	mAP	68.4	DSVT
3D Object Detection	nuScenes LiDAR only	mAP (val)	66.4	DSVT
3D Object Detection	nuScenes	NDS	0.73	DSVT
3D Object Detection	nuScenes	mAAE	0.14	DSVT
3D Object Detection	nuScenes	mAOE	0.3	DSVT
3D Object Detection	nuScenes	mASE	0.23	DSVT
3D Object Detection	nuScenes	mATE	0.25	DSVT
3D Object Detection	nuScenes	mAVE	0.25	DSVT
3D Object Detection	Waymo Open Dataset	mAPH/L2	72.1	DSVT
3D Object Detection	waymo cyclist	APH/L2	78	DSVT(val)
3D Object Detection	waymo vehicle	APH/L2	74.1	DSVT(val)
3D Object Detection	waymo vehicle	L1 mAP	82.1	DSVT(val)
3D Object Detection	waymo pedestrian	APH/L2	76.4	DSVT(val)
2D Classification	nuScenes LiDAR only	NDS	72.7	DSVT
2D Classification	nuScenes LiDAR only	NDS (val)	71.1	DSVT
2D Classification	nuScenes LiDAR only	mAP	68.4	DSVT
2D Classification	nuScenes LiDAR only	mAP (val)	66.4	DSVT
2D Classification	nuScenes	NDS	0.73	DSVT
2D Classification	nuScenes	mAAE	0.14	DSVT
2D Classification	nuScenes	mAOE	0.3	DSVT
2D Classification	nuScenes	mASE	0.23	DSVT
2D Classification	nuScenes	mATE	0.25	DSVT
2D Classification	nuScenes	mAVE	0.25	DSVT
2D Classification	Waymo Open Dataset	mAPH/L2	72.1	DSVT
2D Classification	waymo cyclist	APH/L2	78	DSVT(val)
2D Classification	waymo vehicle	APH/L2	74.1	DSVT(val)
2D Classification	waymo vehicle	L1 mAP	82.1	DSVT(val)
2D Classification	waymo pedestrian	APH/L2	76.4	DSVT(val)
2D Object Detection	nuScenes LiDAR only	NDS	72.7	DSVT
2D Object Detection	nuScenes LiDAR only	NDS (val)	71.1	DSVT
2D Object Detection	nuScenes LiDAR only	mAP	68.4	DSVT
2D Object Detection	nuScenes LiDAR only	mAP (val)	66.4	DSVT
2D Object Detection	nuScenes	NDS	0.73	DSVT
2D Object Detection	nuScenes	mAAE	0.14	DSVT
2D Object Detection	nuScenes	mAOE	0.3	DSVT
2D Object Detection	nuScenes	mASE	0.23	DSVT
2D Object Detection	nuScenes	mATE	0.25	DSVT
2D Object Detection	nuScenes	mAVE	0.25	DSVT
2D Object Detection	Waymo Open Dataset	mAPH/L2	72.1	DSVT
2D Object Detection	waymo cyclist	APH/L2	78	DSVT(val)
2D Object Detection	waymo vehicle	APH/L2	74.1	DSVT(val)
2D Object Detection	waymo vehicle	L1 mAP	82.1	DSVT(val)
2D Object Detection	waymo pedestrian	APH/L2	76.4	DSVT(val)
16k	nuScenes LiDAR only	NDS	72.7	DSVT
16k	nuScenes LiDAR only	NDS (val)	71.1	DSVT
16k	nuScenes LiDAR only	mAP	68.4	DSVT
16k	nuScenes LiDAR only	mAP (val)	66.4	DSVT
16k	nuScenes	NDS	0.73	DSVT
16k	nuScenes	mAAE	0.14	DSVT
16k	nuScenes	mAOE	0.3	DSVT
16k	nuScenes	mASE	0.23	DSVT
16k	nuScenes	mATE	0.25	DSVT
16k	nuScenes	mAVE	0.25	DSVT
16k	Waymo Open Dataset	mAPH/L2	72.1	DSVT
16k	waymo cyclist	APH/L2	78	DSVT(val)
16k	waymo vehicle	APH/L2	74.1	DSVT(val)
16k	waymo vehicle	L1 mAP	82.1	DSVT(val)
16k	waymo pedestrian	APH/L2	76.4	DSVT(val)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Abstract

Results

Related Papers

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Abstract

Results

Related Papers