Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

2022-06-16Autonomous Vehicles Data Augmentation Bird's-Eye View Semantic Segmentation

Abstract

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect -- even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)

Abstract

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	36.9	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	43	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	40.9	Simple-BEV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	46.6	Simple-BEV
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.4	Simple-BEV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	43.6	Simple-BEV (ResNet-50)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	70.7	Simple-BEV (ResNet-50)

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Abstract

Results

Related Papers

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Abstract

Results

Related Papers