PointBeV: A Sparse Approach to BeV Predictions

Loick Chambon, Eloi Zablocki, Mickael Chen, Florent Bartoccioni, Patrick Perez, Matthieu Cord

2023-12-01Bird's-Eye View Semantic Segmentation BEV Segmentation

Abstract

Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV, a novel sparse BeV segmentation model operating on sparse BeV cells instead of dense grids. This approach offers precise control over memory usage, enabling the use of long temporal contexts and accommodating memory-constrained platforms. PointBeV employs an efficient two-pass strategy for training, enabling focused computation on regions of interest. At inference time, it can be used with various memory/performance trade-offs and flexibly adjusts to new specific use cases. PointBeV achieves state-of-the-art results on the nuScenes dataset for vehicle, pedestrian, and lane segmentation, showcasing superior performance in static and temporal settings despite being trained solely with sparse signals. We will release our code along with two new efficient modules used in the architecture: Sparse Feature Pulling, designed for the effective extraction of features from images to BeV, and Submanifold Attention, which enables efficient temporal modeling. Our code is available at https://github.com/valeoai/PointBeV.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)
10-shot image generation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
10-shot image generation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
10-shot image generation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)

Abstract

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)
10-shot image generation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
10-shot image generation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
10-shot image generation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
10-shot image generation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
10-shot image generation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	19.9	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	39.9	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44.7	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	43.2	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	48.7	PointBeV
Bird's-Eye View Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	49.6	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU ped - 224x480 - Vis filter. - 100x100 at 0.5	18.5	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	38.7	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	44	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	42.1	PointBeV (static)
Bird's-Eye View Semantic Segmentation	nuScenes	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	47.6	PointBeV (static)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	45.4	PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.6	PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Long	44.5	PointBeV (ResNet-50)
Bird's-Eye View Semantic Segmentation	Lyft Level 5	IoU vehicle - 224x480 - Short	72.3	PointBeV (ResNet-50)

PointBeV: A Sparse Approach to BeV Predictions

Abstract

Results

Related Papers

PointBeV: A Sparse Approach to BeV Predictions

Abstract

Results

Related Papers