TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Li et al.

Li et al.

Reported on 77 benchmarks across 15 tasks · 5 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision63 results

  • VideoonGTEA
    mAP@0.1:0.7· 2021-03-11
    36.4
    best: 76.9 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • VideoonGTEA
    mAP@0.5· 2021-03-11
    28.8
    best: 66.3 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Temporal Action LocalizationonGTEA
    mAP@0.1:0.7· 2021-03-11
    36.4
    best: 76.9 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Temporal Action LocalizationonGTEA
    mAP@0.5· 2021-03-11
    28.8
    best: 66.3 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Action LocalizationonGTEA
    mAP@0.1:0.7· 2021-03-11
    36.4
    best: 76.9 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Action LocalizationonGTEA
    mAP@0.5· 2021-03-11
    28.8
    best: 66.3 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Weakly Supervised Action LocalizationonGTEA
    mAP@0.1:0.7· 2021-03-11
    36.4
    best: 76.9 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Weakly Supervised Action LocalizationonGTEA
    mAP@0.5· 2021-03-11
    28.8
    best: 66.3 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • 3D Human Pose EstimationonHuman3.6M
    Average PMPJPE (mm)· 2020-08-13
    44.3
    SOTA
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • Pose EstimationonHuman3.6M
    Average PMPJPE (mm)· 2020-08-13
    44.3
    SOTA
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • Depth EstimationonNYU-Depth V2
    RMSE· 2016-07-04
    0.635
    best: 0.013 (Defocus/DepthNet (Normalized))
    SOTA
    A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB ImagesarXiv:1607.00730
  • VideoonBEOID
    mAP@0.1:0.7· 2021-03-11
    34.4
    best: 59.4 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • VideoonBEOID
    mAP@0.5· 2021-03-11
    20.3
    best: 55.3 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Temporal Action LocalizationonBEOID
    mAP@0.1:0.7· 2021-03-11
    34.4
    best: 59.4 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Temporal Action LocalizationonBEOID
    mAP@0.5· 2021-03-11
    20.3
    best: 55.3 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Action LocalizationonBEOID
    mAP@0.1:0.7· 2021-03-11
    34.4
    best: 59.4 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Action LocalizationonBEOID
    mAP@0.5· 2021-03-11
    20.3
    best: 55.3 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Weakly Supervised Action LocalizationonBEOID
    mAP@0.1:0.7· 2021-03-11
    34.4
    best: 59.4 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Weakly Supervised Action LocalizationonBEOID
    mAP@0.5· 2021-03-11
    20.3
    best: 55.3 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • 3D Human Pose EstimationonHuman3.6M
    Average MPJPE (mm)· 2020-08-13
    73.9
    best: 131.7 (Rhodin et al.)
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • Pose EstimationonHuman3.6M
    Average MPJPE (mm)· 2020-08-13
    73.9
    best: 131.7 (Rhodin et al.)
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • 3D Human Pose EstimationonHuman3.6M
    Average MPJPE (mm)· 2020-06-14
    62.9
    best: 131.7 (Rhodin et al.)
    Cascaded deep monocular 3D human pose estimation with evolutionary training dataarXiv:2006.07778
  • Pose EstimationonHuman3.6M
    Average MPJPE (mm)· 2020-06-14
    62.9
    best: 131.7 (Rhodin et al.)
    Cascaded deep monocular 3D human pose estimation with evolutionary training dataarXiv:2006.07778
  • 3D Human Pose EstimationonHuman3.6M
    Number of Frames Per View
    1
    best: 243 (VideoPose3D (T=243))
  • 3D Human Pose EstimationonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • 3D Human Pose EstimationonHuman3.6M
    Average MPJPE (mm)
    88.8
    best: 131.7 (Rhodin et al.)
  • 3D Human Pose EstimationonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • Pose EstimationonHuman3.6M
    Number of Frames Per View
    1
    best: 243 (VideoPose3D (T=243))
  • Pose EstimationonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • Pose EstimationonHuman3.6M
    Average MPJPE (mm)
    88.8
    best: 131.7 (Rhodin et al.)
  • Pose EstimationonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • Instance SegmentationonA2D Sentences
    AP
    0.163
    best: 0.585 (SgMg (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    IoU mean
    0.354
    best: 0.725 (SOC (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    IoU overall
    0.515
    best: 0.807 (SOC (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    Precision@0.5
    0.387
    best: 0.851 (SOC (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    Precision@0.6
    0.29
    best: 0.827 (SOC (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    Precision@0.7
    0.175
    best: 0.767 (SgMg (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    Precision@0.8
    0.066
    best: 0.617 (SgMg (Video-Swin-B))
  • Instance SegmentationonA2D Sentences
    Precision@0.9
    0.001
    best: 0.259 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    AP
    0.173
    best: 0.45 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    IoU mean
    0.491
    best: 0.725 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    IoU overall
    0.529
    best: 0.737 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    Precision@0.5
    0.578
    best: 0.972 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    Precision@0.6
    0.335
    best: 0.917 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    Precision@0.7
    0.103
    best: 0.714 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    Precision@0.8
    0.06
    best: 0.225 (SgMg (Video-Swin-B))
  • Instance SegmentationonJ-HMDB
    Precision@0.9
    0
    best: 0.4 (HINet)
  • Referring Expression SegmentationonA2D Sentences
    AP
    0.163
    best: 0.585 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    IoU mean
    0.354
    best: 0.725 (SOC (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    IoU overall
    0.515
    best: 0.807 (SOC (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    Precision@0.5
    0.387
    best: 0.851 (SOC (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    Precision@0.6
    0.29
    best: 0.827 (SOC (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    Precision@0.7
    0.175
    best: 0.767 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    Precision@0.8
    0.066
    best: 0.617 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonA2D Sentences
    Precision@0.9
    0.001
    best: 0.259 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    AP
    0.173
    best: 0.45 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    IoU mean
    0.491
    best: 0.725 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    IoU overall
    0.529
    best: 0.737 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    Precision@0.5
    0.578
    best: 0.972 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    Precision@0.6
    0.335
    best: 0.917 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    Precision@0.7
    0.103
    best: 0.714 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    Precision@0.8
    0.06
    best: 0.225 (SgMg (Video-Swin-B))
  • Referring Expression SegmentationonJ-HMDB
    Precision@0.9
    0
    best: 0.4 (HINet)

Methodology12 results

  • Zero-Shot LearningonGTEA
    mAP@0.1:0.7· 2021-03-11
    36.4
    best: 76.9 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Zero-Shot LearningonGTEA
    mAP@0.5· 2021-03-11
    28.8
    best: 66.3 (AU-Action)
    SOTA
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • 3DonHuman3.6M
    Average PMPJPE (mm)· 2020-08-13
    44.3
    SOTA
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • 3DonNYU-Depth V2
    RMSE· 2016-07-04
    0.635
    best: 0.013 (Defocus/DepthNet (Normalized))
    SOTA
    A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB ImagesarXiv:1607.00730
  • Zero-Shot LearningonBEOID
    mAP@0.1:0.7· 2021-03-11
    34.4
    best: 59.4 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • Zero-Shot LearningonBEOID
    mAP@0.5· 2021-03-11
    20.3
    best: 55.3 (HR-Pro)
    Temporal Action Segmentation from Timestamp SupervisionarXiv:2103.06669
  • 3DonHuman3.6M
    Average MPJPE (mm)· 2020-08-13
    73.9
    best: 131.7 (Rhodin et al.)
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • 3DonHuman3.6M
    Average MPJPE (mm)· 2020-06-14
    62.9
    best: 131.7 (Rhodin et al.)
    Cascaded deep monocular 3D human pose estimation with evolutionary training dataarXiv:2006.07778
  • 3DonHuman3.6M
    Number of Frames Per View
    1
    best: 243 (VideoPose3D (T=243))
  • 3DonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • 3DonHuman3.6M
    Average MPJPE (mm)
    88.8
    best: 131.7 (Rhodin et al.)
  • 3DonHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)

Audio7 results

  • 1 Image, 2*2 StitchionHuman3.6M
    Average PMPJPE (mm)· 2020-08-13
    44.3
    SOTA
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • 1 Image, 2*2 StitchionHuman3.6M
    Average MPJPE (mm)· 2020-08-13
    73.9
    best: 131.7 (Rhodin et al.)
    Weakly Supervised Generative Network for Multiple 3D Human Pose HypothesesarXiv:2008.05770
  • 1 Image, 2*2 StitchionHuman3.6M
    Average MPJPE (mm)· 2020-06-14
    62.9
    best: 131.7 (Rhodin et al.)
    Cascaded deep monocular 3D human pose estimation with evolutionary training dataarXiv:2006.07778
  • 1 Image, 2*2 StitchionHuman3.6M
    Number of Frames Per View
    1
    best: 243 (VideoPose3D (T=243))
  • 1 Image, 2*2 StitchionHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)
  • 1 Image, 2*2 StitchionHuman3.6M
    Average MPJPE (mm)
    88.8
    best: 131.7 (Rhodin et al.)
  • 1 Image, 2*2 StitchionHuman3.6M
    Number of Views
    1
    best: 2 (Kocabas et al.)

Natural Language Processing7 results

  • Semantic Role LabelingonOntoNotes
    F1· 2019-01-16
    86
    best: 88.59 (HeSyFu)
    Dependency or Span, End-to-End Uniform Semantic Role LabelingarXiv:1901.05280
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-1
    41.54
    best: 48.18 (Scrambled code + broken (alter))
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-2
    18.18
    best: 24.02 (Pegasus)
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-L
    36.47
    best: 45.35 (Scrambled code + broken (alter))
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-1
    40.3
    best: 48.18 (Scrambled code + broken (alter))
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-2
    18.02
    best: 24.02 (Pegasus)
  • Abstractive Text SummarizationonCNN / Daily Mail
    ROUGE-L
    37.36
    best: 45.35 (Scrambled code + broken (alter))

Knowledge Base6 results

  • Text SummarizationonCNN / Daily Mail
    ROUGE-1
    41.54
    best: 48.18 (Scrambled code + broken (alter))
  • Text SummarizationonCNN / Daily Mail
    ROUGE-2
    18.18
    best: 24.02 (Pegasus)
  • Text SummarizationonCNN / Daily Mail
    ROUGE-L
    36.47
    best: 45.35 (Scrambled code + broken (alter))
  • Text SummarizationonCNN / Daily Mail
    ROUGE-1
    40.3
    best: 48.18 (Scrambled code + broken (alter))
  • Text SummarizationonCNN / Daily Mail
    ROUGE-2
    18.02
    best: 24.02 (Pegasus)
  • Text SummarizationonCNN / Daily Mail
    ROUGE-L
    37.36
    best: 45.35 (Scrambled code + broken (alter))