Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun
In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| Object Detection | TruckScenes | NDS | 12.1 | PETR |
| Object Detection | TruckScenes | mAP | 2.2 | PETR |
| 3D | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| 3D | TruckScenes | NDS | 12.1 | PETR |
| 3D | TruckScenes | mAP | 2.2 | PETR |
| 3D Object Detection | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| 3D Object Detection | TruckScenes | NDS | 12.1 | PETR |
| 3D Object Detection | TruckScenes | mAP | 2.2 | PETR |
| 2D Classification | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| 2D Classification | TruckScenes | NDS | 12.1 | PETR |
| 2D Classification | TruckScenes | mAP | 2.2 | PETR |
| 2D Object Detection | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| 2D Object Detection | TruckScenes | NDS | 12.1 | PETR |
| 2D Object Detection | TruckScenes | mAP | 2.2 | PETR |
| 16k | 3D Object Detection on Argoverse2 Camera Only | Average mAP | 17.6 | PETR |
| 16k | TruckScenes | NDS | 12.1 | PETR |
| 16k | TruckScenes | mAP | 2.2 | PETR |