TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, LiWei Wang

2023-01-15CVPR 2023 1object-detection3D Object DetectionObject Detection
PaperPDFCode(official)CodeCodeCode(official)

Abstract

Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D perception. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points. In this paper, we present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. In order to efficiently process sparse points in parallel, we propose Dynamic Sparse Window Attention, which partitions a series of local regions in each window according to its sparsity and then computes the features of all regions in a fully parallel manner. To allow the cross-set connection, we design a rotated set partitioning strategy that alternates between two partitioning configurations in consecutive self-attention layers. To support effective downsampling and better encode geometric information, we also propose an attention-style 3D pooling module on sparse points, which is powerful and deployment-friendly without utilizing any customized CUDA operations. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks. More importantly, DSVT can be easily deployed by TensorRT with real-time inference speed (27Hz). Code will be available at \url{https://github.com/Haiyang-W/DSVT}.

Results

TaskDatasetMetricValueModel
Object DetectionnuScenes LiDAR onlyNDS72.7DSVT
Object DetectionnuScenes LiDAR onlyNDS (val)71.1DSVT
Object DetectionnuScenes LiDAR onlymAP68.4DSVT
Object DetectionnuScenes LiDAR onlymAP (val)66.4DSVT
Object DetectionnuScenesNDS0.73DSVT
Object DetectionnuScenesmAAE0.14DSVT
Object DetectionnuScenesmAOE0.3DSVT
Object DetectionnuScenesmASE0.23DSVT
Object DetectionnuScenesmATE0.25DSVT
Object DetectionnuScenesmAVE0.25DSVT
Object DetectionWaymo Open DatasetmAPH/L272.1DSVT
Object Detectionwaymo cyclistAPH/L278DSVT(val)
Object Detectionwaymo vehicleAPH/L274.1DSVT(val)
Object Detectionwaymo vehicleL1 mAP82.1DSVT(val)
Object Detectionwaymo pedestrianAPH/L276.4DSVT(val)
3DnuScenes LiDAR onlyNDS72.7DSVT
3DnuScenes LiDAR onlyNDS (val)71.1DSVT
3DnuScenes LiDAR onlymAP68.4DSVT
3DnuScenes LiDAR onlymAP (val)66.4DSVT
3DnuScenesNDS0.73DSVT
3DnuScenesmAAE0.14DSVT
3DnuScenesmAOE0.3DSVT
3DnuScenesmASE0.23DSVT
3DnuScenesmATE0.25DSVT
3DnuScenesmAVE0.25DSVT
3DWaymo Open DatasetmAPH/L272.1DSVT
3Dwaymo cyclistAPH/L278DSVT(val)
3Dwaymo vehicleAPH/L274.1DSVT(val)
3Dwaymo vehicleL1 mAP82.1DSVT(val)
3Dwaymo pedestrianAPH/L276.4DSVT(val)
3D Object DetectionnuScenes LiDAR onlyNDS72.7DSVT
3D Object DetectionnuScenes LiDAR onlyNDS (val)71.1DSVT
3D Object DetectionnuScenes LiDAR onlymAP68.4DSVT
3D Object DetectionnuScenes LiDAR onlymAP (val)66.4DSVT
3D Object DetectionnuScenesNDS0.73DSVT
3D Object DetectionnuScenesmAAE0.14DSVT
3D Object DetectionnuScenesmAOE0.3DSVT
3D Object DetectionnuScenesmASE0.23DSVT
3D Object DetectionnuScenesmATE0.25DSVT
3D Object DetectionnuScenesmAVE0.25DSVT
3D Object DetectionWaymo Open DatasetmAPH/L272.1DSVT
3D Object Detectionwaymo cyclistAPH/L278DSVT(val)
3D Object Detectionwaymo vehicleAPH/L274.1DSVT(val)
3D Object Detectionwaymo vehicleL1 mAP82.1DSVT(val)
3D Object Detectionwaymo pedestrianAPH/L276.4DSVT(val)
2D ClassificationnuScenes LiDAR onlyNDS72.7DSVT
2D ClassificationnuScenes LiDAR onlyNDS (val)71.1DSVT
2D ClassificationnuScenes LiDAR onlymAP68.4DSVT
2D ClassificationnuScenes LiDAR onlymAP (val)66.4DSVT
2D ClassificationnuScenesNDS0.73DSVT
2D ClassificationnuScenesmAAE0.14DSVT
2D ClassificationnuScenesmAOE0.3DSVT
2D ClassificationnuScenesmASE0.23DSVT
2D ClassificationnuScenesmATE0.25DSVT
2D ClassificationnuScenesmAVE0.25DSVT
2D ClassificationWaymo Open DatasetmAPH/L272.1DSVT
2D Classificationwaymo cyclistAPH/L278DSVT(val)
2D Classificationwaymo vehicleAPH/L274.1DSVT(val)
2D Classificationwaymo vehicleL1 mAP82.1DSVT(val)
2D Classificationwaymo pedestrianAPH/L276.4DSVT(val)
2D Object DetectionnuScenes LiDAR onlyNDS72.7DSVT
2D Object DetectionnuScenes LiDAR onlyNDS (val)71.1DSVT
2D Object DetectionnuScenes LiDAR onlymAP68.4DSVT
2D Object DetectionnuScenes LiDAR onlymAP (val)66.4DSVT
2D Object DetectionnuScenesNDS0.73DSVT
2D Object DetectionnuScenesmAAE0.14DSVT
2D Object DetectionnuScenesmAOE0.3DSVT
2D Object DetectionnuScenesmASE0.23DSVT
2D Object DetectionnuScenesmATE0.25DSVT
2D Object DetectionnuScenesmAVE0.25DSVT
2D Object DetectionWaymo Open DatasetmAPH/L272.1DSVT
2D Object Detectionwaymo cyclistAPH/L278DSVT(val)
2D Object Detectionwaymo vehicleAPH/L274.1DSVT(val)
2D Object Detectionwaymo vehicleL1 mAP82.1DSVT(val)
2D Object Detectionwaymo pedestrianAPH/L276.4DSVT(val)
16knuScenes LiDAR onlyNDS72.7DSVT
16knuScenes LiDAR onlyNDS (val)71.1DSVT
16knuScenes LiDAR onlymAP68.4DSVT
16knuScenes LiDAR onlymAP (val)66.4DSVT
16knuScenesNDS0.73DSVT
16knuScenesmAAE0.14DSVT
16knuScenesmAOE0.3DSVT
16knuScenesmASE0.23DSVT
16knuScenesmATE0.25DSVT
16knuScenesmAVE0.25DSVT
16kWaymo Open DatasetmAPH/L272.1DSVT
16kwaymo cyclistAPH/L278DSVT(val)
16kwaymo vehicleAPH/L274.1DSVT(val)
16kwaymo vehicleL1 mAP82.1DSVT(val)
16kwaymo pedestrianAPH/L276.4DSVT(val)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07