TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PETRv2: A Unified Framework for 3D Perception from Multi-C...

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, Xiangyu Zhang, Jian Sun

2022-06-02ICCV 2023 1Segmentation3D Lane DetectionBird's-Eye View Semantic SegmentationMulti-Task LearningBEV Segmentationobject-detection3D Object DetectionObject DetectionLane Detection
PaperPDFCode(official)

Abstract

In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling. The 3D PE achieves the temporal alignment on object position of different frames. A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. To support for multi-task learning (e.g., BEV segmentation and 3D lane detection), PETRv2 provides a simple yet effective solution by introducing task-specific queries, which are initialized under different spaces. PETRv2 achieves state-of-the-art performance on 3D object detection, BEV segmentation and 3D lane detection. Detailed robustness analysis is also conducted on PETR framework. We hope PETRv2 can serve as a strong baseline for 3D perception. Code is available at \url{https://github.com/megvii-research/PETR}.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesOpenLaneF1 (all)61.2PETRv2-V∗ (VoVNetV2 with 400 anchor points)
Autonomous VehiclesOpenLaneF1 (all)57.8PETRv2-V (VoVNetV2)
Autonomous VehiclesOpenLaneF1 (all)51.9PETRv2-E (EfficientNet)
Semantic SegmentationnuScenesIoU lane - 224x480 - 100x100 at 0.544.8PETRv2
Object DetectionnuScenes Camera OnlyNDS59.2PETRv2-pure
3DnuScenes Camera OnlyNDS59.2PETRv2-pure
3D Object DetectionnuScenes Camera OnlyNDS59.2PETRv2-pure
2D ClassificationnuScenes Camera OnlyNDS59.2PETRv2-pure
Lane DetectionOpenLaneF1 (all)61.2PETRv2-V∗ (VoVNetV2 with 400 anchor points)
Lane DetectionOpenLaneF1 (all)57.8PETRv2-V (VoVNetV2)
Lane DetectionOpenLaneF1 (all)51.9PETRv2-E (EfficientNet)
2D Object DetectionnuScenes Camera OnlyNDS59.2PETRv2-pure
10-shot image generationnuScenesIoU lane - 224x480 - 100x100 at 0.544.8PETRv2
Bird's-Eye View Semantic SegmentationnuScenesIoU lane - 224x480 - 100x100 at 0.544.8PETRv2
16knuScenes Camera OnlyNDS59.2PETRv2-pure

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17