TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BEVFormer

BEVFormer

Reported on 87 benchmarks across 9 tasks · 2 papers · 12 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Methodology76 results

  • 3DonRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • 2D ClassificationonRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • 2D Object DetectiononRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • 16konRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • 3DonnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3DonDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D ClassificationonDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 2D Object DetectiononDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 16konDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270

Computer Vision43 results

  • Bird's-Eye View Semantic SegmentationonnuScenes
    IoU lane - 224x480 - 100x100 at 0.5· 2022-03-31
    25.7
    best: 49.6 (PointBeV (static))
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Bird's-Eye View Semantic SegmentationonnuScenes
    IoU veh - 224x480 - Vis filter. - 100x100 at 0.5· 2022-03-31
    42
    best: 44.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Bird's-Eye View Semantic SegmentationonnuScenes
    IoU veh - 448x800 - No vis filter - 100x100 at 0.5· 2022-03-31
    39
    best: 43.2 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Bird's-Eye View Semantic SegmentationonnuScenes
    IoU veh - 448x800 - Vis filter. - 100x100 at 0.5· 2022-03-31
    45.5
    best: 48.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • 3D Object DetectiononRope3D
    AP@0.7· 2022-11-18
    24.64
    best: 75.27 (MonoUNI)
    BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective SupervisionarXiv:2211.10439
  • Object DetectiononnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Object DetectiononDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes Camera Only
    NDS· 2022-03-31
    56.9
    best: 68.7 (Far3D)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    NDS· 2022-03-31
    0.57
    best: 55.3 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAAE· 2022-03-31
    0.13
    best: 1 (BirdNet+ (multisweep))
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAOE· 2022-03-31
    0.38
    best: 1.6 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAP· 2022-03-31
    0.48
    best: 45.1 (LabelDistill)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mASE· 2022-03-31
    0.26
    best: 1 (qww)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mATE· 2022-03-31
    0.58
    best: 1.06 (3D-GCK)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononnuScenes
    mAVE· 2022-03-31
    0.38
    best: 2.21 (PointNet)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononDAIR-V2X-I
    AP|R40(easy)· 2022-03-31
    61.4
    best: 90.92 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononDAIR-V2X-I
    AP|R40(hard)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 3D Object DetectiononDAIR-V2X-I
    AP|R40(moderate)· 2022-03-31
    50.7
    best: 87.2 (MonoUNI)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Bird's-Eye View Semantic SegmentationonnuScenes
    IoU veh - 224x480 - No vis filter - 100x100 at 0.5· 2022-03-31
    35.8
    best: 39.9 (PointBeV)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270

Medical5 results

  • Semantic SegmentationonnuScenes
    IoU lane - 224x480 - 100x100 at 0.5· 2022-03-31
    25.7
    best: 49.6 (PointBeV (static))
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Semantic SegmentationonnuScenes
    IoU veh - 224x480 - Vis filter. - 100x100 at 0.5· 2022-03-31
    42
    best: 44.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Semantic SegmentationonnuScenes
    IoU veh - 448x800 - No vis filter - 100x100 at 0.5· 2022-03-31
    39
    best: 43.2 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Semantic SegmentationonnuScenes
    IoU veh - 448x800 - Vis filter. - 100x100 at 0.5· 2022-03-31
    45.5
    best: 48.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • Semantic SegmentationonnuScenes
    IoU veh - 224x480 - No vis filter - 100x100 at 0.5· 2022-03-31
    35.8
    best: 39.9 (PointBeV)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270

Audio5 results

  • 10-shot image generationonnuScenes
    IoU lane - 224x480 - 100x100 at 0.5· 2022-03-31
    25.7
    best: 49.6 (PointBeV (static))
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 10-shot image generationonnuScenes
    IoU veh - 224x480 - Vis filter. - 100x100 at 0.5· 2022-03-31
    42
    best: 44.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 10-shot image generationonnuScenes
    IoU veh - 448x800 - No vis filter - 100x100 at 0.5· 2022-03-31
    39
    best: 43.2 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 10-shot image generationonnuScenes
    IoU veh - 448x800 - Vis filter. - 100x100 at 0.5· 2022-03-31
    45.5
    best: 48.7 (PointBeV)
    SOTA
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270
  • 10-shot image generationonnuScenes
    IoU veh - 224x480 - No vis filter - 100x100 at 0.5· 2022-03-31
    35.8
    best: 39.9 (PointBeV)
    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal TransformersarXiv:2203.17270