TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/VTP

VTP

Reported on 32 benchmarks across 9 tasks · 2 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision16 results

  • 3D Human Pose EstimationonShelf
    MPJPE· 2022-05-25
    56.3
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Human Pose EstimationonCampus
    Mean mAP· 2022-05-25
    80.1
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Pose EstimationonShelf
    MPJPE· 2022-05-25
    56.3
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Pose EstimationonCampus
    Mean mAP· 2022-05-25
    80.1
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Multi-Person Pose EstimationonShelf
    MPJPE· 2022-05-25
    56.3
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Multi-Person Pose EstimationonCampus
    Mean mAP· 2022-05-25
    80.1
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • LipreadingonLRS2
    Word Error Rate (WER)· uses extra data· 2021-10-14
    28.9
    best: 14.6 (Auto-AVSR)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603
  • 3D Human Pose EstimationonPanoptic
    Average MPJPE (mm)· uses extra data· 2022-05-25
    17.62
    best: 135.4 (BMP)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Human Pose EstimationonShelf
    PCP3D· 2022-05-25
    97.3
    best: 100 (RapidPoseTriangulation (with corrected labels))
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Human Pose EstimationonCampus
    PCP3D· 2022-05-25
    96.3
    best: 97.4 (TesseTrack)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Pose EstimationonPanoptic
    Average MPJPE (mm)· uses extra data· 2022-05-25
    17.62
    best: 135.4 (BMP)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Pose EstimationonShelf
    PCP3D· 2022-05-25
    97.3
    best: 100 (RapidPoseTriangulation (with corrected labels))
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Pose EstimationonCampus
    PCP3D· 2022-05-25
    96.3
    best: 97.4 (TesseTrack)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Multi-Person Pose EstimationonShelf
    PCP3D· 2022-05-25
    97.3
    best: 100 (RapidPoseTriangulation (with corrected labels))
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3D Multi-Person Pose EstimationonCampus
    PCP3D· 2022-05-25
    96.3
    best: 97.4 (TesseTrack)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • LipreadingonLRS3-TED
    Word Error Rate (WER)· uses extra data· 2021-10-14
    40.6
    best: 12.8 (LP + Conformer)
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603

Audio7 results

  • 1 Image, 2*2 StitchionShelf
    MPJPE· 2022-05-25
    56.3
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 1 Image, 2*2 StitchionCampus
    Mean mAP· 2022-05-25
    80.1
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • Speech RecognitiononLRS3-TED
    Word Error Rate (WER)· uses extra data· 2021-10-14
    40.6
    best: 0.68 (Whisper)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603
  • Speech RecognitiononLRS2
    Word Error Rate (WER)· uses extra data· 2021-10-14
    28.9
    best: 2.1 (RAVEn Large)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603
  • 1 Image, 2*2 StitchionPanoptic
    Average MPJPE (mm)· uses extra data· 2022-05-25
    17.62
    best: 135.4 (BMP)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 1 Image, 2*2 StitchionShelf
    PCP3D· 2022-05-25
    97.3
    best: 100 (RapidPoseTriangulation (with corrected labels))
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 1 Image, 2*2 StitchionCampus
    PCP3D· 2022-05-25
    96.3
    best: 97.4 (TesseTrack)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602

Methodology5 results

  • 3DonShelf
    MPJPE· 2022-05-25
    56.3
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3DonCampus
    Mean mAP· 2022-05-25
    80.1
    SOTA
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3DonPanoptic
    Average MPJPE (mm)· uses extra data· 2022-05-25
    17.62
    best: 135.4 (BMP)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3DonShelf
    PCP3D· 2022-05-25
    97.3
    best: 100 (RapidPoseTriangulation (with corrected labels))
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602
  • 3DonCampus
    PCP3D· 2022-05-25
    96.3
    best: 97.4 (TesseTrack)
    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose EstimationarXiv:2205.12602

Natural Language Processing2 results

  • Natural Language TransductiononLRS2
    Word Error Rate (WER)· uses extra data· 2021-10-14
    28.9
    best: 14.6 (Auto-AVSR)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603
  • Natural Language TransductiononLRS3-TED
    Word Error Rate (WER)· uses extra data· 2021-10-14
    40.6
    best: 12.8 (LP + Conformer)
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603

Speech2 results

  • Visual Speech RecognitiononLRS3-TED
    Word Error Rate (WER)· uses extra data· 2021-10-14
    40.6
    best: 19.1 (CTC/Attention)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603
  • Visual Speech RecognitiononLRS2
    Word Error Rate (WER)· uses extra data· 2021-10-14
    28.9
    best: 22.6 (VTP with more data)
    SOTA
    Sub-word Level Lip Reading With Visual AttentionarXiv:2110.07603