Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/SVM

SVM

Reported on 77 benchmarks across 19 tasks · 11 papers · 11 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing44 results

Sentiment AnalysisonDynaSent
10 fold Cross validation· uses extra data· 2017-08-19
1
SOTA
Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM arXiv:1708.05891
Sentiment AnalysisonB-T4SA
Accuracy· 2021-02-16
95.16
best: 95.19 (AutoML-Based Fusion Approach)
An AutoML-based Approach to Multimodal Image Sentiment Analysis arXiv:2102.08092
Sentiment AnalysisonTweetEval
ALL· 2020-10-23
53.5
best: 67.9 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Emoji· 2020-10-23
29.3
best: 33.4 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Emotion· 2020-10-23
64.7
best: 79.5 (RoB-RT)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Hate· 2020-10-23
36.7
best: 52.6 (LSTM)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Irony· 2020-10-23
61.7
best: 82.1 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Offensive· 2020-10-23
52.3
best: 80.5 (RoB-RT)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Sentiment· 2020-10-23
62.9
best: 73.4 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Stance· 2020-10-23
67.3
best: 71.2 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Abuse DetectiononEthos Binary
Classification Accuracy· 2020-06-11
0.6643
best: 0.8015 (BiLSTM + static BE)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Abuse DetectiononEthos Binary
F1-score· 2020-06-11
0.6607
best: 0.7971 (BiLSTM + static BE)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Abuse DetectiononEthos Binary
Precision· 2020-06-11
66.47
best: 79.17 (BERT)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Hate Speech DetectiononEthos Binary
Classification Accuracy· 2020-06-11
0.6643
best: 0.8015 (BiLSTM + static BE)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Hate Speech DetectiononEthos Binary
F1-score· 2020-06-11
0.6607
best: 0.7971 (BiLSTM + static BE)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Hate Speech DetectiononEthos Binary
Precision· 2020-06-11
66.47
best: 79.17 (BERT)
ETHOS: an Online Hate Speech Detection Dataset arXiv:2006.08328
Humor Detectionon200k Short Texts for Humor Detection
F1-score· 2020-04-27
0.874
best: 0.982 (ColBERT model)
ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for Computational Humor arXiv:2004.12765
Abuse DetectiononWaseem et al., 2018
AAA
46.51
best: 50.94 (Mozafari et al., 2019)
Abuse DetectiononWaseem et al., 2018
F1 (micro)
82.18
best: 84.42 (Mozafari et al., 2019)
Hate Speech DetectiononWaseem et al., 2018
AAA
46.51
best: 50.94 (Mozafari et al., 2019)
Hate Speech DetectiononWaseem et al., 2018
F1 (micro)
82.18
best: 84.42 (Mozafari et al., 2019)
Cross-LingualonReddit Ideological and Extreme Bias Dataset
weighted-F1 score
79.1
Text ClassificationonMVICTOR (type)
Average F1
0.6792
best: 0.7505 (CNN + CRF)
Text ClassificationonMVICTOR (type)
Weighted F1
0.9288
best: 0.9537 (CNN + CRF)
Text ClassificationonTREC-50
Error
8.4
best: 2.8 (Rules)
Text ClassificationonThreatGram 101 - Extreme Telegram Data
weighted-F1 score
64.3
best: 66.2 (GPT-2)
Text ClassificationonSVICTOR (type)
Average F1
0.7632
best: 0.774 (CNN + CRF)
Text ClassificationonSVICTOR (type)
Weighted F1
0.9425
best: 0.9533 (CNN + CRF)
Text ClassificationonMVICTOR (theme)
Average F1
0.6642
best: 0.8882 (XGBoost)
Text ClassificationonMVICTOR (theme)
Weighted F1
0.8137
best: 0.9072 (XGBoost)
Text ClassificationonSVICTOR (theme)
Average F1
0.8246
best: 0.8887 (XGBoost)
Text ClassificationonSVICTOR (theme)
Weighted F1
0.8231
best: 0.8634 (XGBoost)
Text ClassificationonBVICTOR
Average F1
0.7761
best: 0.8843 (XGBoost)
Text ClassificationonBVICTOR
Weighted F1
0.8235
best: 0.8957 (XGBoost)
Text ClassificationonACL-ARC
Macro-F1
41
best: 81.75 (SS-cGAN + SciBERT)
Cross-Lingual Document ClassificationonReddit Ideological and Extreme Bias Dataset
weighted-F1 score
79.1
Fact VerificationonKILT: FEVER
Accuracy
70.71
best: 89.55 (Re2G)
Fact VerificationonKILT: FEVER
KILT-AC
0
best: 78.53 (Re2G)
Fact VerificationonKILT: FEVER
R-Prec
0
best: 88.92 (Re2G)
Fact VerificationonKILT: FEVER
Recall@5
0
best: 92.52 (Re2G)
Fact VerificationonKILT: FEVER
Accuracy
68.43
best: 89.55 (Re2G)
Fact VerificationonKILT: FEVER
KILT-AC
0
best: 78.53 (Re2G)
Fact VerificationonKILT: FEVER
R-Prec
0
best: 88.92 (Re2G)
Fact VerificationonKILT: FEVER
Recall@5
0
best: 92.52 (Re2G)

Methodology27 results

Electroencephalogram (EEG)onPhyAAt
MAE· 2020-05-23
29.65
best: 4.75
SOTA
PhyAAt: Physiology of Auditory Attention to Speech Dataset arXiv:2005.11577
Electroencephalogram (EEG)onPhyAAt
MAE· 2020-05-23
4.75
SOTA
PhyAAt: Physiology of Auditory Attention to Speech Dataset arXiv:2005.11577
Electroencephalogram (EEG)onPhyAAt
Accuracy· 2020-05-23
81
SOTA
PhyAAt: Physiology of Auditory Attention to Speech Dataset arXiv:2005.11577
Electroencephalogram (EEG)onPhyAAt
Accuracy· 2020-05-23
56
best: 81
PhyAAt: Physiology of Auditory Attention to Speech Dataset arXiv:2005.11577
Multi-Label ClassificationonMIMIC-III
Micro-F1· 2018-02-15
44.1
best: 61.2 (GKI-ICD)
Explainable Prediction of Medical Codes from Clinical Text arXiv:1802.05695
Domain AdaptationonPACS
Average Accuracy· 2017-10-09
58.74
best: 99 (SIMPLE+)
Deeper, Broader and Artier Domain Generalization arXiv:1710.03077
Multi-Label Text ClassificationonMVICTOR (theme)
Average F1
0.6642
best: 0.8882 (XGBoost)
Multi-Label Text ClassificationonMVICTOR (theme)
Weighted F1
0.8137
best: 0.9072 (XGBoost)
Multi-Label Text ClassificationonSVICTOR (theme)
Average F1
0.8246
best: 0.8887 (XGBoost)
Multi-Label Text ClassificationonSVICTOR (theme)
Weighted F1
0.8231
best: 0.8634 (XGBoost)
Multi-Label Text ClassificationonBVICTOR
Average F1
0.7761
best: 0.8843 (XGBoost)
Multi-Label Text ClassificationonBVICTOR
Weighted F1
0.8235
best: 0.8957 (XGBoost)
ClassificationonReddit Ideology Database
F1-score (Weighted)
86.19
ClassificationonKepler Exoplanet Search Results
F1 (%)
97.72
ClassificationonMVICTOR (type)
Average F1
0.6792
best: 0.7505 (CNN + CRF)
ClassificationonMVICTOR (type)
Weighted F1
0.9288
best: 0.9537 (CNN + CRF)
ClassificationonTREC-50
Error
8.4
best: 2.8 (Rules)
ClassificationonThreatGram 101 - Extreme Telegram Data
weighted-F1 score
64.3
best: 66.2 (GPT-2)
ClassificationonSVICTOR (type)
Average F1
0.7632
best: 0.774 (CNN + CRF)
ClassificationonSVICTOR (type)
Weighted F1
0.9425
best: 0.9533 (CNN + CRF)
ClassificationonMVICTOR (theme)
Average F1
0.6642
best: 0.8882 (XGBoost)
ClassificationonMVICTOR (theme)
Weighted F1
0.8137
best: 0.9072 (XGBoost)
ClassificationonSVICTOR (theme)
Average F1
0.8246
best: 0.8887 (XGBoost)
ClassificationonSVICTOR (theme)
Weighted F1
0.8231
best: 0.8634 (XGBoost)
ClassificationonBVICTOR
Average F1
0.7761
best: 0.8843 (XGBoost)
ClassificationonBVICTOR
Weighted F1
0.8235
best: 0.8957 (XGBoost)
ClassificationonACL-ARC
Macro-F1
41
best: 81.75 (SS-cGAN + SciBERT)

Computer Vision7 results

Person Re-IdentificationoneSports Sensors Dataset
LogLoss· 2020-11-02
0.01588
SOTA
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Skills EvaluationoneSports Sensors Dataset
Accuracy· 2020-11-02
85.6
SOTA
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Skills EvaluationoneSports Sensors Dataset
LogLoss· 2020-11-02
0.311
SOTA
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Skills EvaluationoneSports Sensors Dataset
ROC AUC· 2020-11-02
0.945
SOTA
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Person Re-IdentificationoneSports Sensors Dataset
Accuracy· 2020-11-02
45
best: 52.1 (Random Forest)
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Person Re-IdentificationoneSports Sensors Dataset
ROC AUC· 2020-11-02
0.89
best: 0.919 (Random Forest)
Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset arXiv:2011.00958
Domain GeneralizationonPACS
Average Accuracy· 2017-10-09
58.74
best: 99 (SIMPLE+)
Deeper, Broader and Artier Domain Generalization arXiv:1710.03077

Audio4 results

Instrument RecognitiononIRMAS
F1-score· 2019-11-30
0.81
SOTA
Predominant Musical Instrument Classification based on Spectral Features arXiv:1912.02606
Instrument RecognitiononIRMAS
Precision· 2019-11-30
0.79
SOTA
Predominant Musical Instrument Classification based on Spectral Features arXiv:1912.02606
Instrument RecognitiononIRMAS
Recall· 2019-11-30
0.84
SOTA
Predominant Musical Instrument Classification based on Spectral Features arXiv:1912.02606
Emotion RecognitiononSEED-IV
Accuracy· 2019-07-18
56.61
best: 79.37 (RGNN)
EEG-Based Emotion Recognition Using Regularized Graph Neural Networks arXiv:1907.07835

Medical1 result

Medical Code PredictiononMIMIC-III
Micro-F1· 2018-02-15
44.1
best: 61.2 (GKI-ICD)
Explainable Prediction of Medical Codes from Clinical Text arXiv:1802.05695