Li et al.

Reported on 77 benchmarks across 15 tasks · 5 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision63 results

VideoonGTEA
mAP@0.1:0.7· 2021-03-11
36.4
best: 76.9 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
VideoonGTEA
mAP@0.5· 2021-03-11
28.8
best: 66.3 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Temporal Action LocalizationonGTEA
mAP@0.1:0.7· 2021-03-11
36.4
best: 76.9 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Temporal Action LocalizationonGTEA
mAP@0.5· 2021-03-11
28.8
best: 66.3 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Action LocalizationonGTEA
mAP@0.1:0.7· 2021-03-11
36.4
best: 76.9 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Action LocalizationonGTEA
mAP@0.5· 2021-03-11
28.8
best: 66.3 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Weakly Supervised Action LocalizationonGTEA
mAP@0.1:0.7· 2021-03-11
36.4
best: 76.9 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Weakly Supervised Action LocalizationonGTEA
mAP@0.5· 2021-03-11
28.8
best: 66.3 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
3D Human Pose EstimationonHuman3.6M
Average PMPJPE (mm)· 2020-08-13
44.3
SOTA
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
Pose EstimationonHuman3.6M
Average PMPJPE (mm)· 2020-08-13
44.3
SOTA
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
Depth EstimationonNYU-Depth V2
RMSE· 2016-07-04
0.635
best: 0.013 (Defocus/DepthNet (Normalized))
SOTA
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images arXiv:1607.00730
VideoonBEOID
mAP@0.1:0.7· 2021-03-11
34.4
best: 59.4 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
VideoonBEOID
mAP@0.5· 2021-03-11
20.3
best: 55.3 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Temporal Action LocalizationonBEOID
mAP@0.1:0.7· 2021-03-11
34.4
best: 59.4 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Temporal Action LocalizationonBEOID
mAP@0.5· 2021-03-11
20.3
best: 55.3 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Action LocalizationonBEOID
mAP@0.1:0.7· 2021-03-11
34.4
best: 59.4 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Action LocalizationonBEOID
mAP@0.5· 2021-03-11
20.3
best: 55.3 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Weakly Supervised Action LocalizationonBEOID
mAP@0.1:0.7· 2021-03-11
34.4
best: 59.4 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Weakly Supervised Action LocalizationonBEOID
mAP@0.5· 2021-03-11
20.3
best: 55.3 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
3D Human Pose EstimationonHuman3.6M
Average MPJPE (mm)· 2020-08-13
73.9
best: 131.7 (Rhodin et al.)
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
Pose EstimationonHuman3.6M
Average MPJPE (mm)· 2020-08-13
73.9
best: 131.7 (Rhodin et al.)
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
3D Human Pose EstimationonHuman3.6M
Average MPJPE (mm)· 2020-06-14
62.9
best: 131.7 (Rhodin et al.)
Cascaded deep monocular 3D human pose estimation with evolutionary training data arXiv:2006.07778
Pose EstimationonHuman3.6M
Average MPJPE (mm)· 2020-06-14
62.9
best: 131.7 (Rhodin et al.)
Cascaded deep monocular 3D human pose estimation with evolutionary training data arXiv:2006.07778
3D Human Pose EstimationonHuman3.6M
Number of Frames Per View
1
best: 243 (VideoPose3D (T=243))
3D Human Pose EstimationonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
3D Human Pose EstimationonHuman3.6M
Average MPJPE (mm)
88.8
best: 131.7 (Rhodin et al.)
3D Human Pose EstimationonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
Pose EstimationonHuman3.6M
Number of Frames Per View
1
best: 243 (VideoPose3D (T=243))
Pose EstimationonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
Pose EstimationonHuman3.6M
Average MPJPE (mm)
88.8
best: 131.7 (Rhodin et al.)
Pose EstimationonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
Instance SegmentationonA2D Sentences
AP
0.163
best: 0.585 (SgMg (Video-Swin-B))
Instance SegmentationonA2D Sentences
IoU mean
0.354
best: 0.725 (SOC (Video-Swin-B))
Instance SegmentationonA2D Sentences
IoU overall
0.515
best: 0.807 (SOC (Video-Swin-B))
Instance SegmentationonA2D Sentences
Precision@0.5
0.387
best: 0.851 (SOC (Video-Swin-B))
Instance SegmentationonA2D Sentences
Precision@0.6
0.29
best: 0.827 (SOC (Video-Swin-B))
Instance SegmentationonA2D Sentences
Precision@0.7
0.175
best: 0.767 (SgMg (Video-Swin-B))
Instance SegmentationonA2D Sentences
Precision@0.8
0.066
best: 0.617 (SgMg (Video-Swin-B))
Instance SegmentationonA2D Sentences
Precision@0.9
0.001
best: 0.259 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
AP
0.173
best: 0.45 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
IoU mean
0.491
best: 0.725 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
IoU overall
0.529
best: 0.737 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
Precision@0.5
0.578
best: 0.972 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
Precision@0.6
0.335
best: 0.917 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
Precision@0.7
0.103
best: 0.714 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
Precision@0.8
0.06
best: 0.225 (SgMg (Video-Swin-B))
Instance SegmentationonJ-HMDB
Precision@0.9
0
best: 0.4 (HINet)
Referring Expression SegmentationonA2D Sentences
AP
0.163
best: 0.585 (SgMg (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
IoU mean
0.354
best: 0.725 (SOC (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
IoU overall
0.515
best: 0.807 (SOC (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
Precision@0.5
0.387
best: 0.851 (SOC (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
Precision@0.6
0.29
best: 0.827 (SOC (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
Precision@0.7
0.175
best: 0.767 (SgMg (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
Precision@0.8
0.066
best: 0.617 (SgMg (Video-Swin-B))
Referring Expression SegmentationonA2D Sentences
Precision@0.9
0.001
best: 0.259 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
AP
0.173
best: 0.45 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
IoU mean
0.491
best: 0.725 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
IoU overall
0.529
best: 0.737 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
Precision@0.5
0.578
best: 0.972 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
Precision@0.6
0.335
best: 0.917 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
Precision@0.7
0.103
best: 0.714 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
Precision@0.8
0.06
best: 0.225 (SgMg (Video-Swin-B))
Referring Expression SegmentationonJ-HMDB
Precision@0.9
0
best: 0.4 (HINet)

Methodology12 results

Zero-Shot LearningonGTEA
mAP@0.1:0.7· 2021-03-11
36.4
best: 76.9 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Zero-Shot LearningonGTEA
mAP@0.5· 2021-03-11
28.8
best: 66.3 (AU-Action)
SOTA
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
3DonHuman3.6M
Average PMPJPE (mm)· 2020-08-13
44.3
SOTA
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
3DonNYU-Depth V2
RMSE· 2016-07-04
0.635
best: 0.013 (Defocus/DepthNet (Normalized))
SOTA
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images arXiv:1607.00730
Zero-Shot LearningonBEOID
mAP@0.1:0.7· 2021-03-11
34.4
best: 59.4 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
Zero-Shot LearningonBEOID
mAP@0.5· 2021-03-11
20.3
best: 55.3 (HR-Pro)
Temporal Action Segmentation from Timestamp Supervision arXiv:2103.06669
3DonHuman3.6M
Average MPJPE (mm)· 2020-08-13
73.9
best: 131.7 (Rhodin et al.)
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
3DonHuman3.6M
Average MPJPE (mm)· 2020-06-14
62.9
best: 131.7 (Rhodin et al.)
Cascaded deep monocular 3D human pose estimation with evolutionary training data arXiv:2006.07778
3DonHuman3.6M
Number of Frames Per View
1
best: 243 (VideoPose3D (T=243))
3DonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
3DonHuman3.6M
Average MPJPE (mm)
88.8
best: 131.7 (Rhodin et al.)
3DonHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)

Audio7 results

1 Image, 2*2 StitchionHuman3.6M
Average PMPJPE (mm)· 2020-08-13
44.3
SOTA
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
1 Image, 2*2 StitchionHuman3.6M
Average MPJPE (mm)· 2020-08-13
73.9
best: 131.7 (Rhodin et al.)
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses arXiv:2008.05770
1 Image, 2*2 StitchionHuman3.6M
Average MPJPE (mm)· 2020-06-14
62.9
best: 131.7 (Rhodin et al.)
Cascaded deep monocular 3D human pose estimation with evolutionary training data arXiv:2006.07778
1 Image, 2*2 StitchionHuman3.6M
Number of Frames Per View
1
best: 243 (VideoPose3D (T=243))
1 Image, 2*2 StitchionHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)
1 Image, 2*2 StitchionHuman3.6M
Average MPJPE (mm)
88.8
best: 131.7 (Rhodin et al.)
1 Image, 2*2 StitchionHuman3.6M
Number of Views
1
best: 2 (Kocabas et al.)

Natural Language Processing7 results

Semantic Role LabelingonOntoNotes
F1· 2019-01-16
86
best: 88.59 (HeSyFu)
Dependency or Span, End-to-End Uniform Semantic Role Labeling arXiv:1901.05280
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-1
41.54
best: 48.18 (Scrambled code + broken (alter))
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-2
18.18
best: 24.02 (Pegasus)
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-L
36.47
best: 45.35 (Scrambled code + broken (alter))
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-1
40.3
best: 48.18 (Scrambled code + broken (alter))
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-2
18.02
best: 24.02 (Pegasus)
Abstractive Text SummarizationonCNN / Daily Mail
ROUGE-L
37.36
best: 45.35 (Scrambled code + broken (alter))

Knowledge Base6 results

Text SummarizationonCNN / Daily Mail
ROUGE-1
41.54
best: 48.18 (Scrambled code + broken (alter))
Text SummarizationonCNN / Daily Mail
ROUGE-2
18.18
best: 24.02 (Pegasus)
Text SummarizationonCNN / Daily Mail
ROUGE-L
36.47
best: 45.35 (Scrambled code + broken (alter))
Text SummarizationonCNN / Daily Mail
ROUGE-1
40.3
best: 48.18 (Scrambled code + broken (alter))
Text SummarizationonCNN / Daily Mail
ROUGE-2
18.02
best: 24.02 (Pegasus)
Text SummarizationonCNN / Daily Mail
ROUGE-L
37.36
best: 45.35 (Scrambled code + broken (alter))