TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Simple-BEV: What Really Matters for Multi-Sensor BEV Perce...

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

2022-06-16Autonomous VehiclesData AugmentationBird's-Eye View Semantic Segmentation
PaperPDFCode

Abstract

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect -- even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.

Results

TaskDatasetMetricValueModel
Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.536.9Simple-BEV
Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.543Simple-BEV
Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.540.9Simple-BEV
Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.546.6Simple-BEV
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long44.5Simple-BEV (EfficientNet-b4)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short70.4Simple-BEV (EfficientNet-b4)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long43.6Simple-BEV (ResNet-50)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short70.7Simple-BEV (ResNet-50)
10-shot image generationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.536.9Simple-BEV
10-shot image generationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.543Simple-BEV
10-shot image generationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.540.9Simple-BEV
10-shot image generationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.546.6Simple-BEV
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Long44.5Simple-BEV (EfficientNet-b4)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Short70.4Simple-BEV (EfficientNet-b4)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Long43.6Simple-BEV (ResNet-50)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Short70.7Simple-BEV (ResNet-50)
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.536.9Simple-BEV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.543Simple-BEV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.540.9Simple-BEV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.546.6Simple-BEV
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long44.5Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short70.4Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long43.6Simple-BEV (ResNet-50)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short70.7Simple-BEV (ResNet-50)

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11