TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/SVM

SVM

Reported on 77 benchmarks across 19 tasks · 11 papers · 11 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing44 results

  • Sentiment AnalysisonDynaSent
    10 fold Cross validation· uses extra data· 2017-08-19
    1
    SOTA
    Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVMarXiv:1708.05891
  • Sentiment AnalysisonB-T4SA
    Accuracy· 2021-02-16
    95.16
    best: 95.19 (AutoML-Based Fusion Approach)
    An AutoML-based Approach to Multimodal Image Sentiment AnalysisarXiv:2102.08092
  • Sentiment AnalysisonTweetEval
    ALL· 2020-10-23
    53.5
    best: 67.9 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Emoji· 2020-10-23
    29.3
    best: 33.4 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Emotion· 2020-10-23
    64.7
    best: 79.5 (RoB-RT)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Hate· 2020-10-23
    36.7
    best: 52.6 (LSTM)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Irony· 2020-10-23
    61.7
    best: 82.1 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Offensive· 2020-10-23
    52.3
    best: 80.5 (RoB-RT)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Sentiment· 2020-10-23
    62.9
    best: 73.4 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Stance· 2020-10-23
    67.3
    best: 71.2 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Abuse DetectiononEthos Binary
    Classification Accuracy· 2020-06-11
    0.6643
    best: 0.8015 (BiLSTM + static BE)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Abuse DetectiononEthos Binary
    F1-score· 2020-06-11
    0.6607
    best: 0.7971 (BiLSTM + static BE)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Abuse DetectiononEthos Binary
    Precision· 2020-06-11
    66.47
    best: 79.17 (BERT)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Hate Speech DetectiononEthos Binary
    Classification Accuracy· 2020-06-11
    0.6643
    best: 0.8015 (BiLSTM + static BE)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Hate Speech DetectiononEthos Binary
    F1-score· 2020-06-11
    0.6607
    best: 0.7971 (BiLSTM + static BE)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Hate Speech DetectiononEthos Binary
    Precision· 2020-06-11
    66.47
    best: 79.17 (BERT)
    ETHOS: an Online Hate Speech Detection DatasetarXiv:2006.08328
  • Humor Detectionon200k Short Texts for Humor Detection
    F1-score· 2020-04-27
    0.874
    best: 0.982 (ColBERT model)
    ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for Computational HumorarXiv:2004.12765
  • Abuse DetectiononWaseem et al., 2018
    AAA
    46.51
    best: 50.94 (Mozafari et al., 2019)
  • Abuse DetectiononWaseem et al., 2018
    F1 (micro)
    82.18
    best: 84.42 (Mozafari et al., 2019)
  • Hate Speech DetectiononWaseem et al., 2018
    AAA
    46.51
    best: 50.94 (Mozafari et al., 2019)
  • Hate Speech DetectiononWaseem et al., 2018
    F1 (micro)
    82.18
    best: 84.42 (Mozafari et al., 2019)
  • Cross-LingualonReddit Ideological and Extreme Bias Dataset
    weighted-F1 score
    79.1
  • Text ClassificationonMVICTOR (type)
    Average F1
    0.6792
    best: 0.7505 (CNN + CRF)
  • Text ClassificationonMVICTOR (type)
    Weighted F1
    0.9288
    best: 0.9537 (CNN + CRF)
  • Text ClassificationonTREC-50
    Error
    8.4
    best: 2.8 (Rules)
  • Text ClassificationonThreatGram 101 - Extreme Telegram Data
    weighted-F1 score
    64.3
    best: 66.2 (GPT-2)
  • Text ClassificationonSVICTOR (type)
    Average F1
    0.7632
    best: 0.774 (CNN + CRF)
  • Text ClassificationonSVICTOR (type)
    Weighted F1
    0.9425
    best: 0.9533 (CNN + CRF)
  • Text ClassificationonMVICTOR (theme)
    Average F1
    0.6642
    best: 0.8882 (XGBoost)
  • Text ClassificationonMVICTOR (theme)
    Weighted F1
    0.8137
    best: 0.9072 (XGBoost)
  • Text ClassificationonSVICTOR (theme)
    Average F1
    0.8246
    best: 0.8887 (XGBoost)
  • Text ClassificationonSVICTOR (theme)
    Weighted F1
    0.8231
    best: 0.8634 (XGBoost)
  • Text ClassificationonBVICTOR
    Average F1
    0.7761
    best: 0.8843 (XGBoost)
  • Text ClassificationonBVICTOR
    Weighted F1
    0.8235
    best: 0.8957 (XGBoost)
  • Text ClassificationonACL-ARC
    Macro-F1
    41
    best: 81.75 (SS-cGAN + SciBERT)
  • Cross-Lingual Document ClassificationonReddit Ideological and Extreme Bias Dataset
    weighted-F1 score
    79.1
  • Fact VerificationonKILT: FEVER
    Accuracy
    70.71
    best: 89.55 (Re2G)
  • Fact VerificationonKILT: FEVER
    KILT-AC
    0
    best: 78.53 (Re2G)
  • Fact VerificationonKILT: FEVER
    R-Prec
    0
    best: 88.92 (Re2G)
  • Fact VerificationonKILT: FEVER
    Recall@5
    0
    best: 92.52 (Re2G)
  • Fact VerificationonKILT: FEVER
    Accuracy
    68.43
    best: 89.55 (Re2G)
  • Fact VerificationonKILT: FEVER
    KILT-AC
    0
    best: 78.53 (Re2G)
  • Fact VerificationonKILT: FEVER
    R-Prec
    0
    best: 88.92 (Re2G)
  • Fact VerificationonKILT: FEVER
    Recall@5
    0
    best: 92.52 (Re2G)

Methodology27 results

  • Electroencephalogram (EEG)onPhyAAt
    MAE· 2020-05-23
    29.65
    best: 4.75
    SOTA
    PhyAAt: Physiology of Auditory Attention to Speech DatasetarXiv:2005.11577
  • Electroencephalogram (EEG)onPhyAAt
    MAE· 2020-05-23
    4.75
    SOTA
    PhyAAt: Physiology of Auditory Attention to Speech DatasetarXiv:2005.11577
  • Electroencephalogram (EEG)onPhyAAt
    Accuracy· 2020-05-23
    81
    SOTA
    PhyAAt: Physiology of Auditory Attention to Speech DatasetarXiv:2005.11577
  • Electroencephalogram (EEG)onPhyAAt
    Accuracy· 2020-05-23
    56
    best: 81
    PhyAAt: Physiology of Auditory Attention to Speech DatasetarXiv:2005.11577
  • Multi-Label ClassificationonMIMIC-III
    Micro-F1· 2018-02-15
    44.1
    best: 61.2 (GKI-ICD)
    Explainable Prediction of Medical Codes from Clinical TextarXiv:1802.05695
  • Domain AdaptationonPACS
    Average Accuracy· 2017-10-09
    58.74
    best: 99 (SIMPLE+)
    Deeper, Broader and Artier Domain GeneralizationarXiv:1710.03077
  • Multi-Label Text ClassificationonMVICTOR (theme)
    Average F1
    0.6642
    best: 0.8882 (XGBoost)
  • Multi-Label Text ClassificationonMVICTOR (theme)
    Weighted F1
    0.8137
    best: 0.9072 (XGBoost)
  • Multi-Label Text ClassificationonSVICTOR (theme)
    Average F1
    0.8246
    best: 0.8887 (XGBoost)
  • Multi-Label Text ClassificationonSVICTOR (theme)
    Weighted F1
    0.8231
    best: 0.8634 (XGBoost)
  • Multi-Label Text ClassificationonBVICTOR
    Average F1
    0.7761
    best: 0.8843 (XGBoost)
  • Multi-Label Text ClassificationonBVICTOR
    Weighted F1
    0.8235
    best: 0.8957 (XGBoost)
  • ClassificationonReddit Ideology Database
    F1-score (Weighted)
    86.19
  • ClassificationonKepler Exoplanet Search Results
    F1 (%)
    97.72
  • ClassificationonMVICTOR (type)
    Average F1
    0.6792
    best: 0.7505 (CNN + CRF)
  • ClassificationonMVICTOR (type)
    Weighted F1
    0.9288
    best: 0.9537 (CNN + CRF)
  • ClassificationonTREC-50
    Error
    8.4
    best: 2.8 (Rules)
  • ClassificationonThreatGram 101 - Extreme Telegram Data
    weighted-F1 score
    64.3
    best: 66.2 (GPT-2)
  • ClassificationonSVICTOR (type)
    Average F1
    0.7632
    best: 0.774 (CNN + CRF)
  • ClassificationonSVICTOR (type)
    Weighted F1
    0.9425
    best: 0.9533 (CNN + CRF)
  • ClassificationonMVICTOR (theme)
    Average F1
    0.6642
    best: 0.8882 (XGBoost)
  • ClassificationonMVICTOR (theme)
    Weighted F1
    0.8137
    best: 0.9072 (XGBoost)
  • ClassificationonSVICTOR (theme)
    Average F1
    0.8246
    best: 0.8887 (XGBoost)
  • ClassificationonSVICTOR (theme)
    Weighted F1
    0.8231
    best: 0.8634 (XGBoost)
  • ClassificationonBVICTOR
    Average F1
    0.7761
    best: 0.8843 (XGBoost)
  • ClassificationonBVICTOR
    Weighted F1
    0.8235
    best: 0.8957 (XGBoost)
  • ClassificationonACL-ARC
    Macro-F1
    41
    best: 81.75 (SS-cGAN + SciBERT)

Computer Vision7 results

  • Person Re-IdentificationoneSports Sensors Dataset
    LogLoss· 2020-11-02
    0.01588
    SOTA
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Skills EvaluationoneSports Sensors Dataset
    Accuracy· 2020-11-02
    85.6
    SOTA
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Skills EvaluationoneSports Sensors Dataset
    LogLoss· 2020-11-02
    0.311
    SOTA
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Skills EvaluationoneSports Sensors Dataset
    ROC AUC· 2020-11-02
    0.945
    SOTA
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Person Re-IdentificationoneSports Sensors Dataset
    Accuracy· 2020-11-02
    45
    best: 52.1 (Random Forest)
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Person Re-IdentificationoneSports Sensors Dataset
    ROC AUC· 2020-11-02
    0.89
    best: 0.919 (Random Forest)
    Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports DatasetarXiv:2011.00958
  • Domain GeneralizationonPACS
    Average Accuracy· 2017-10-09
    58.74
    best: 99 (SIMPLE+)
    Deeper, Broader and Artier Domain GeneralizationarXiv:1710.03077

Audio4 results

  • Instrument RecognitiononIRMAS
    F1-score· 2019-11-30
    0.81
    SOTA
    Predominant Musical Instrument Classification based on Spectral FeaturesarXiv:1912.02606
  • Instrument RecognitiononIRMAS
    Precision· 2019-11-30
    0.79
    SOTA
    Predominant Musical Instrument Classification based on Spectral FeaturesarXiv:1912.02606
  • Instrument RecognitiononIRMAS
    Recall· 2019-11-30
    0.84
    SOTA
    Predominant Musical Instrument Classification based on Spectral FeaturesarXiv:1912.02606
  • Emotion RecognitiononSEED-IV
    Accuracy· 2019-07-18
    56.61
    best: 79.37 (RGNN)
    EEG-Based Emotion Recognition Using Regularized Graph Neural NetworksarXiv:1907.07835

Medical1 result

  • Medical Code PredictiononMIMIC-III
    Micro-F1· 2018-02-15
    44.1
    best: 61.2 (GKI-ICD)
    Explainable Prediction of Medical Codes from Clinical TextarXiv:1802.05695