PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, Xiangyu Zhang, Jian Sun

2022-06-02ICCV 2023 1Segmentation 3D Lane Detection Bird's-Eye View Semantic Segmentation Multi-Task Learning BEV Segmentation object-detection 3D Object Detection Object Detection Lane Detection

Paper PDF Code(official)

Abstract

In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling. The 3D PE achieves the temporal alignment on object position of different frames. A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. To support for multi-task learning (e.g., BEV segmentation and 3D lane detection), PETRv2 provides a simple yet effective solution by introducing task-specific queries, which are initialized under different spaces. PETRv2 achieves state-of-the-art performance on 3D object detection, BEV segmentation and 3D lane detection. Detailed robustness analysis is also conducted on PETR framework. We hope PETRv2 can serve as a strong baseline for 3D perception. Code is available at \url{https://github.com/megvii-research/PETR}.

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	OpenLane	F1 (all)	61.2	PETRv2-V∗ (VoVNetV2 with 400 anchor points)
Autonomous Vehicles	OpenLane	F1 (all)	57.8	PETRv2-V (VoVNetV2)
Autonomous Vehicles	OpenLane	F1 (all)	51.9	PETRv2-E (EfficientNet)
Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	44.8	PETRv2
Object Detection	nuScenes Camera Only	NDS	59.2	PETRv2-pure
3D	nuScenes Camera Only	NDS	59.2	PETRv2-pure
3D Object Detection	nuScenes Camera Only	NDS	59.2	PETRv2-pure
2D Classification	nuScenes Camera Only	NDS	59.2	PETRv2-pure
Lane Detection	OpenLane	F1 (all)	61.2	PETRv2-V∗ (VoVNetV2 with 400 anchor points)
Lane Detection	OpenLane	F1 (all)	57.8	PETRv2-V (VoVNetV2)
Lane Detection	OpenLane	F1 (all)	51.9	PETRv2-E (EfficientNet)
2D Object Detection	nuScenes Camera Only	NDS	59.2	PETRv2-pure
10-shot image generation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	44.8	PETRv2
Bird's-Eye View Semantic Segmentation	nuScenes	IoU lane - 224x480 - 100x100 at 0.5	44.8	PETRv2
16k	nuScenes Camera Only	NDS	59.2	PETRv2-pure

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Abstract

Results

Related Papers

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Abstract

Results

Related Papers