Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/RoBERTa

RoBERTa

Reported on 67 benchmarks across 26 tasks · 10 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing36 results

Text ClassificationonNICE-2
Accuracy· 2022-11-30
99.76
SOTA
Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets arXiv:2211.16878
Binary text classificationonTURINGBENCH (Turing Test, FAIR_wmt20)
F1 score· 2021-09-27
0.4531
best: 0.9966 (GigaCheck (Mistral-7B))
SOTA
TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation arXiv:2109.13296
Binary text classificationonTURINGBENCH (Turing Test, GPT-3)
F1 score· 2021-09-27
0.5209
best: 0.9709 (GigaCheck (Mistral-7B))
SOTA
TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation arXiv:2109.13296
Reading ComprehensiononRACE
Accuracy· 2019-07-26
83.2
best: 91.4 (ALBERT (Ensemble))
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Common Sense ReasoningonSWAG
Test· 2019-07-26
89.9
best: 90.8 (DeBERTalarge)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Text ClassificationonarXiv-10
Accuracy· 2019-07-26
0.779
best: 0.794 (Protoformer)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Text ClassificationonUK Key Stage Readability
F1· 2024-11-26
73.1
best: 99.6 (ELECTRA + ANN)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics arXiv:2411.17593
Text ClassificationonMR
Accuracy· 2022-11-30
89.42
best: 93.3 (VLAWE)
Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets arXiv:2211.16878
Relation ExtractiononSemEval-2010 Task-8
F1· 2022-08-20
88.7
best: 91.9 (SP)
SPOT: Knowledge-Enhanced Language Representations for Information Extraction arXiv:2208.09625
Natural Language UnderstandingonLexGLUE
CaseHOLD· 2021-10-03
71.7
best: 75.6 (CaseLaw-BERT)
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English arXiv:2110.00976
Code GenerationonCodeSearchNet
Smoothed BLEU-4· 2020-02-19
14.52
best: 15.99 (CodeBERT (MLM+RTD))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - Python
Smoothed BLEU-4· 2020-02-19
14.92
best: 20.39 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - Go
Smoothed BLEU-4· 2020-02-19
26.09
best: 26.79 (CodeBERT (MLM))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - JavaScript
Smoothed BLEU-4· 2020-02-19
5.72
best: 25.61 (Transformer)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - Php
Smoothed BLEU-4· 2020-02-19
19.9
best: 26.23 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - Java
Smoothed BLEU-4· 2020-02-19
13.2
best: 21.87 (CodeTrans-MT-Large)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code GenerationonCodeSearchNet - Ruby
Smoothed BLEU-4· 2020-02-19
7.26
best: 15.26 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Relation ExtractiononTACRED
F1· 2020-02-05
71.3
best: 86.6 (RAG4RE)
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters arXiv:2002.01808
Relation ClassificationonTACRED
F1· 2020-02-05
71.3
best: 76.8 (DeepStruct multi-task w/ finetune)
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters arXiv:2002.01808
Negation Detectionon*sem 2012 Shared Task: Sherlock Dataset
F1· uses extra data· 2020-01-09
91.59
best: 97.26 (NegBioELECTRA)
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Reading ComprehensiononRACE
Accuracy (High)· 2019-07-26
81.3
best: 92.6 (ALBERTxxlarge+DUMA(ensemble))
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Reading ComprehensiononRACE
Accuracy (Middle)· 2019-07-26
86.5
best: 93.1 (Megatron-BERT (ensemble))
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Natural Language InferenceonMultiNLI
Matched· 2019-07-26
90.8
best: 92.6 (Turing NLR v5 XXL 5.4B (fine-tuned))
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Semantic Textual SimilarityonSTS Benchmark
Pearson Correlation· 2019-07-26
0.922
best: 0.929 (MT-DNN-SMART)
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Question AnsweringonCronQuestions
Hits@1
22.5
best: 97.8 (GenTKGQA)
Abuse DetectiononHopeEDI
Weighted Average F1-score
0.93
Abuse DetectiononHopeEDI
Weighted Average F1-score
0.93
Hate Speech DetectiononHopeEDI
Weighted Average F1-score
0.93
Hate Speech DetectiononHopeEDI
Weighted Average F1-score
0.93
Cross-LingualonReddit Ideological and Extreme Bias Dataset
weighted-F1 score
75.2
best: 79.1 (SVM)
Abstractive Text SummarizationonEDUsum
ROUGE-1
63.22
best: 64.48 (GP_Step_Sim)
Abstractive Text SummarizationonEDUsum
ROUGE-2
51.34
best: 52.7 (GP_Step_Sim)
Abstractive Text SummarizationonEDUsum
ROUGE-L
60.26
best: 61.91 (GP_Step_Sim)
Cross-Lingual Document ClassificationonReddit Ideological and Extreme Bias Dataset
weighted-F1 score
75.2
best: 79.1 (SVM)
Hope Speech DetectiononHopeEDI
Weighted Average F1-score
0.93
Hope Speech DetectiononHopeEDI
Weighted Average F1-score
0.93

Computer Code15 results

Program SynthesisonManyTypes4TypeScript
Average Accuracy· 2019-07-26
59.84
best: 71.27 (CodeTIDAL5)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Program SynthesisonManyTypes4TypeScript
Average F1· 2019-07-26
57.54
best: 60.57 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Program SynthesisonManyTypes4TypeScript
Average Precision· 2019-07-26
57.45
best: 60.06 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Program SynthesisonManyTypes4TypeScript
Average Recall· 2019-07-26
57.62
best: 61.08 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Type predictiononManyTypes4TypeScript
Average Accuracy· 2019-07-26
59.84
best: 71.27 (CodeTIDAL5)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Type predictiononManyTypes4TypeScript
Average F1· 2019-07-26
57.54
best: 60.57 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Type predictiononManyTypes4TypeScript
Average Precision· 2019-07-26
57.45
best: 60.06 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Type predictiononManyTypes4TypeScript
Average Recall· 2019-07-26
57.62
best: 61.08 (GraphCodeBERT)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
Code Documentation GenerationonCodeSearchNet
Smoothed BLEU-4· 2020-02-19
14.52
best: 15.99 (CodeBERT (MLM+RTD))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - Python
Smoothed BLEU-4· 2020-02-19
14.92
best: 20.39 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - Go
Smoothed BLEU-4· 2020-02-19
26.09
best: 26.79 (CodeBERT (MLM))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - JavaScript
Smoothed BLEU-4· 2020-02-19
5.72
best: 25.61 (Transformer)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - Php
Smoothed BLEU-4· 2020-02-19
19.9
best: 26.23 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - Java
Smoothed BLEU-4· 2020-02-19
13.2
best: 21.87 (CodeTrans-MT-Large)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Code Documentation GenerationonCodeSearchNet - Ruby
Smoothed BLEU-4· 2020-02-19
7.26
best: 15.26 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155

Methodology9 results

ClassificationonNICE-2
Accuracy· 2022-11-30
99.76
SOTA
Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets arXiv:2211.16878
ClassificationonarXiv-10
Accuracy· 2019-07-26
0.779
best: 0.794 (Protoformer)
SOTA
RoBERTa: A Robustly Optimized BERT Pretraining Approach arXiv:1907.11692
ClassificationonUK Key Stage Readability
F1· 2024-11-26
73.1
best: 99.6 (ELECTRA + ANN)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics arXiv:2411.17593
Data MiningonIMDb Movie Reviews
Accuracy· 2023-08-07
95.3
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Data MiningonIMDb Movie Reviews
F1· 2023-08-07
95.3
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Interpretable Machine LearningonIMDb Movie Reviews
Accuracy· 2023-08-07
95.3
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Interpretable Machine LearningonIMDb Movie Reviews
F1· 2023-08-07
95.3
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
ClassificationonMR
Accuracy· 2022-11-30
89.42
best: 93.3 (VLAWE)
Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets arXiv:2211.16878
ClassificationonReddit Ideology Database
F1-score (Weighted)
78.13
best: 86.19 (SVM)

Adversarial7 results

Text GenerationonCodeSearchNet
Smoothed BLEU-4· 2020-02-19
14.52
best: 15.99 (CodeBERT (MLM+RTD))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - Python
Smoothed BLEU-4· 2020-02-19
14.92
best: 20.39 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - Go
Smoothed BLEU-4· 2020-02-19
26.09
best: 26.79 (CodeBERT (MLM))
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - JavaScript
Smoothed BLEU-4· 2020-02-19
5.72
best: 25.61 (Transformer)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - Php
Smoothed BLEU-4· 2020-02-19
19.9
best: 26.23 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - Java
Smoothed BLEU-4· 2020-02-19
13.2
best: 21.87 (CodeTrans-MT-Large)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155
Text GenerationonCodeSearchNet - Ruby
Smoothed BLEU-4· 2020-02-19
7.26
best: 15.26 (CodeTrans-MT-Base)
CodeBERT: A Pre-Trained Model for Programming and Natural Languages arXiv:2002.08155

Knowledge Base3 results

Text SummarizationonEDUsum
ROUGE-1
63.22
best: 64.48 (GP_Step_Sim)
Text SummarizationonEDUsum
ROUGE-2
51.34
best: 52.7 (GP_Step_Sim)
Text SummarizationonEDUsum
ROUGE-L
60.26
best: 61.91 (GP_Step_Sim)