TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Felix Fent, Gerhard Rigoll

2024-11-29Autonomous DrivingObject LocalizationDepth Estimation3D Multi-Object Trackingobject-detection3D Object DetectionObject Detection
PaperPDF

Abstract

In this work, we present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features. The fusion of radar and camera modalities has emerged as an efficient perception paradigm for autonomous driving systems. While conventional approaches utilize dense Bird's Eye View (BEV)-based architectures for depth estimation, contemporary query-based transformers excel in camera-only detection through object-centric methodology. However, these query-based approaches exhibit limitations in false positive detections and localization precision due to implicit depth modeling. We address these challenges through three key contributions: (1) sparse frustum fusion (SFF) for cross-modal feature alignment, (2) range-adaptive radar aggregation (RAR) for precise object localization, and (3) local self-attention (LSA) for focused query aggregation. In contrast to existing methods requiring computationally intensive BEV-grid rendering, SpaRC operates directly on encoded point features, yielding substantial improvements in efficiency and accuracy. Empirical evaluations on the nuScenes and TruckScenes benchmarks demonstrate that SpaRC significantly outperforms existing dense BEV-based and sparse query-based detectors. Our method achieves state-of-the-art performance metrics of 67.1 NDS and 63.1 AMOTA. The code and pretrained models are available at https://github.com/phi-wol/sparc.

Results

TaskDatasetMetricValueModel
Multi-Object Trackingnuscenes Camera-RadarAMOTA0.631SpaRC
Object Trackingnuscenes Camera-RadarAMOTA0.631SpaRC
Object DetectionnuScenesNDS0.699SpaRC
Object DetectionnuScenesmAP0.646SpaRC
Object Detectionnuscenes Camera-RadarNDS69.9SpaRC
Object DetectionTruckScenesNDS37.4SpaRC
Object DetectionTruckScenesmAP27.2SpaRC
3DnuScenesNDS0.699SpaRC
3DnuScenesmAP0.646SpaRC
3Dnuscenes Camera-RadarNDS69.9SpaRC
3DTruckScenesNDS37.4SpaRC
3DTruckScenesmAP27.2SpaRC
3D Object DetectionnuScenesNDS0.699SpaRC
3D Object DetectionnuScenesmAP0.646SpaRC
3D Object Detectionnuscenes Camera-RadarNDS69.9SpaRC
3D Object DetectionTruckScenesNDS37.4SpaRC
3D Object DetectionTruckScenesmAP27.2SpaRC
3D Multi-Object Trackingnuscenes Camera-RadarAMOTA0.631SpaRC
2D ClassificationnuScenesNDS0.699SpaRC
2D ClassificationnuScenesmAP0.646SpaRC
2D Classificationnuscenes Camera-RadarNDS69.9SpaRC
2D ClassificationTruckScenesNDS37.4SpaRC
2D ClassificationTruckScenesmAP27.2SpaRC
2D Object DetectionnuScenesNDS0.699SpaRC
2D Object DetectionnuScenesmAP0.646SpaRC
2D Object Detectionnuscenes Camera-RadarNDS69.9SpaRC
2D Object DetectionTruckScenesNDS37.4SpaRC
2D Object DetectionTruckScenesmAP27.2SpaRC
16knuScenesNDS0.699SpaRC
16knuScenesmAP0.646SpaRC
16knuscenes Camera-RadarNDS69.9SpaRC
16kTruckScenesNDS37.4SpaRC
16kTruckScenesmAP27.2SpaRC

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17