PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun

2022-03-10Robust Camera Only 3D Object Detection object-detection 3D Object Detection Object Detection

Abstract

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.

Results

Task	Dataset	Metric	Value	Model
Object Detection	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
Object Detection	TruckScenes	NDS	12.1	PETR
Object Detection	TruckScenes	mAP	2.2	PETR
3D	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
3D	TruckScenes	NDS	12.1	PETR
3D	TruckScenes	mAP	2.2	PETR
3D Object Detection	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
3D Object Detection	TruckScenes	NDS	12.1	PETR
3D Object Detection	TruckScenes	mAP	2.2	PETR
2D Classification	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
2D Classification	TruckScenes	NDS	12.1	PETR
2D Classification	TruckScenes	mAP	2.2	PETR
2D Object Detection	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
2D Object Detection	TruckScenes	NDS	12.1	PETR
2D Object Detection	TruckScenes	mAP	2.2	PETR
16k	3D Object Detection on Argoverse2 Camera Only	Average mAP	17.6	PETR
16k	TruckScenes	NDS	12.1	PETR
16k	TruckScenes	mAP	2.2	PETR

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08 Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07