TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PointBeV: A Sparse Approach to BeV Predictions

PointBeV: A Sparse Approach to BeV Predictions

Loick Chambon, Eloi Zablocki, Mickael Chen, Florent Bartoccioni, Patrick Perez, Matthieu Cord

2023-12-01Bird's-Eye View Semantic SegmentationBEV Segmentation
PaperPDFCode(official)

Abstract

Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV, a novel sparse BeV segmentation model operating on sparse BeV cells instead of dense grids. This approach offers precise control over memory usage, enabling the use of long temporal contexts and accommodating memory-constrained platforms. PointBeV employs an efficient two-pass strategy for training, enabling focused computation on regions of interest. At inference time, it can be used with various memory/performance trade-offs and flexibly adjusts to new specific use cases. PointBeV achieves state-of-the-art results on the nuScenes dataset for vehicle, pedestrian, and lane segmentation, showcasing superior performance in static and temporal settings despite being trained solely with sparse signals. We will release our code along with two new efficient modules used in the architecture: Sparse Feature Pulling, designed for the effective extraction of features from images to BeV, and Submanifold Attention, which enables efficient temporal modeling. Our code is available at https://github.com/valeoai/PointBeV.

Results

TaskDatasetMetricValueModel
Semantic SegmentationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.519.9PointBeV
Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.539.9PointBeV
Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544.7PointBeV
Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.543.2PointBeV
Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.548.7PointBeV
Semantic SegmentationnuScenesIoU lane - 224x480 - 100x100 at 0.549.6PointBeV (static)
Semantic SegmentationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.518.5PointBeV (static)
Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.538.7PointBeV (static)
Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544PointBeV (static)
Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.542.1PointBeV (static)
Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.547.6PointBeV (static)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long45.4PointBeV (EfficientNet-b4)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short72.6PointBeV (EfficientNet-b4)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long44.5PointBeV (ResNet-50)
Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short72.3PointBeV (ResNet-50)
10-shot image generationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.519.9PointBeV
10-shot image generationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.539.9PointBeV
10-shot image generationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544.7PointBeV
10-shot image generationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.543.2PointBeV
10-shot image generationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.548.7PointBeV
10-shot image generationnuScenesIoU lane - 224x480 - 100x100 at 0.549.6PointBeV (static)
10-shot image generationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.518.5PointBeV (static)
10-shot image generationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.538.7PointBeV (static)
10-shot image generationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544PointBeV (static)
10-shot image generationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.542.1PointBeV (static)
10-shot image generationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.547.6PointBeV (static)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Long45.4PointBeV (EfficientNet-b4)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Short72.6PointBeV (EfficientNet-b4)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Long44.5PointBeV (ResNet-50)
10-shot image generationLyft Level 5IoU vehicle - 224x480 - Short72.3PointBeV (ResNet-50)
Bird's-Eye View Semantic SegmentationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.519.9PointBeV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.539.9PointBeV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544.7PointBeV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.543.2PointBeV
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.548.7PointBeV
Bird's-Eye View Semantic SegmentationnuScenesIoU lane - 224x480 - 100x100 at 0.549.6PointBeV (static)
Bird's-Eye View Semantic SegmentationnuScenesIoU ped - 224x480 - Vis filter. - 100x100 at 0.518.5PointBeV (static)
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - No vis filter - 100x100 at 0.538.7PointBeV (static)
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 224x480 - Vis filter. - 100x100 at 0.544PointBeV (static)
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - No vis filter - 100x100 at 0.542.1PointBeV (static)
Bird's-Eye View Semantic SegmentationnuScenesIoU veh - 448x800 - Vis filter. - 100x100 at 0.547.6PointBeV (static)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long45.4PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short72.6PointBeV (EfficientNet-b4)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Long44.5PointBeV (ResNet-50)
Bird's-Eye View Semantic SegmentationLyft Level 5IoU vehicle - 224x480 - Short72.3PointBeV (ResNet-50)

Related Papers

NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models2025-07-054D-ROLLS: 4D Radar Occupancy Learning via LiDAR Supervision2025-05-20RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation2025-05-10DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion2025-05-03Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving2025-04-17DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance2025-03-05Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving2025-03-02SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation2025-02-27