Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Sentiment Analysis
/
SST-2 Binary classification
Sentiment Analysis on SST-2 Binary classification
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
T5-11B
97.5
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
2
MT-DNN-SMART
97.5
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
3
T5-3B
97.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
4
MUPPET Roberta Large
97.4
No
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
5
ALBERT
97.1
No
ALBERT: A Lite BERT for Self-supervised Learning...
2019-09-26
Code
6
StructBERTRoBERTa ensemble
97.1
No
StructBERT: Incorporating Language Structures in...
2019-08-13
-
7
XLNet (single model)
97
No
XLNet: Generalized Autoregressive Pretraining fo...
2019-06-19
Code
8
ELECTRA
96.9
No
ELECTRA: Pre-training Text Encoders as Discrimin...
2020-03-23
Code
9
RoBERTa-large 355M + Entailment as Few-shot Learner
96.9
No
Entailment as Few-Shot Learner
2021-04-29
Code
10
XLNet-Large (ensemble)
96.8
No
XLNet: Generalized Autoregressive Pretraining fo...
2019-06-19
Code
11
FLOATER-large
96.7
No
Learning to Encode Position for Transformer with...
2020-03-13
Code
12
MUPPET Roberta base
96.7
No
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
13
RoBERTa (ensemble)
96.7
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
14
DeBERTa (large)
96.5
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
15
MT-DNN-ensemble
96.5
No
Improving Multi-Task Deep Neural Networks via Kn...
2019-04-20
Code
16
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
96.4
No
LLM.int8(): 8-bit Matrix Multiplication for Tran...
2022-08-15
Code
17
ASA + RoBERTa
96.3
No
Adversarial Self-Attention for Language Understa...
2022-06-25
Code
18
T5-Large 770M
96.3
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
19
Snorkel MeTaL(ensemble)
96.2
No
Training Complex Models with Multi-Task Weak Sup...
2018-10-05
Code
20
PSQ (Chen et al., 2020)
96.2
No
A Statistical Framework for Low-bitwidth Trainin...
2020-10-27
Code
21
Heinsen Routing + RoBERTa-large
96
Yes
An Algorithm for Routing Vectors in Sequences
2022-11-20
Code
22
MT-DNN
95.6
No
Multi-Task Deep Neural Networks for Natural Lang...
2019-01-31
Code
23
Heinsen Routing + GPT-2
95.6
Yes
An Algorithm for Routing Capsules in All Domains
2019-11-02
Code
24
T5-Base
95.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
25
ERNIE 2.0 Base
95
No
ERNIE 2.0: A Continual Pre-training Framework fo...
2019-07-29
Code
26
RoBERTa+DualCL
94.91
No
Dual Contrastive Learning: Text Classification v...
2022-01-21
Code
27
BERT-LARGE
94.9
No
BERT: Pre-training of Deep Bidirectional Transfo...
2018-10-11
Code
28
RoBERTa + SubRegWeigh (K-means)
94.84
No
SubRegWeigh: Effective and Efficient Annotation ...
2024-09-10
Code
29
SpanBERT
94.8
No
SpanBERT: Improving Pre-training by Representing...
2019-07-24
Code
30
gMLP-large
94.8
No
Pay Attention to MLPs
2021-05-17
Code
31
Q-BERT (Shen et al., 2020)
94.8
No
Q-BERT: Hessian Based Ultra Low Precision Quanti...
2019-09-12
-
32
Q8BERT (Zafrir et al., 2019)
94.7
No
Q8BERT: Quantized 8Bit BERT
2019-10-14
Code
33
CNN Large
94.6
No
Cloze-driven Pretraining of Self-attention Netwo...
2019-03-19
-
34
BigBird
94.6
No
Big Bird: Transformers for Longer Sequences
2020-07-28
Code
35
MLM+ del-word+ reorder
94.5
No
CLEAR: Contrastive Learning for Sentence Represe...
2020-12-31
-
36
ASA + BERT-base
94.1
No
Adversarial Self-Attention for Language Understa...
2022-06-25
Code
37
RealFormer
94.04
No
RealFormer: Transformer Likes Residual Attention
2020-12-21
Code
38
FNet-Large
94
No
FNet: Mixing Tokens with Fourier Transforms
2021-05-09
Code
39
MT-DNN
93.6
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
40
ERNIE
93.5
No
ERNIE: Enhanced Language Representation with Inf...
2019-05-17
Code
41
Block-sparse LSTM
93.2
No
-
-
Code
42
LM-CPPF RoBERTa-base
93.2
No
LM-CPPF: Paraphrasing-Guided Data Augmentation f...
2023-05-29
Code
43
TinyBERT-6 67M
93.1
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
44
24hBERT
93
No
How to Train BERT with an Academic Budget
2021-04-15
Code
45
SMART+BERT-BASE
93
No
SMART: Robust and Efficient Fine-Tuning for Pre-...
2019-11-08
Code
46
TinyBERT-4 14.5M
92.6
No
TinyBERT: Distilling BERT for Natural Language U...
2019-09-23
Code
47
bmLSTM
91.8
No
Learning to Generate Reviews and Discovering Sen...
2017-04-05
Code
48
T5-Small
91.8
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
49
byte mLSTM7
91.7
No
A La Carte Embedding: Cheap but Effective Induct...
2018-05-14
Code
50
PAR BERT Base
91.6
No
Pay Attention when Required
2020-09-09
Code
51
Charformer-Base
91.6
No
Charformer: Fast Character Transformers via Grad...
2021-06-23
Code
52
SqueezeBERT
91.4
No
SqueezeBERT: What can computer vision teach NLP ...
2020-06-19
Code
53
Nyströmformer
91.4
No
Nyströmformer: A Nyström-Based Algorithm for App...
2021-02-07
Code
54
Bi-CAS-LSTM
91.3
No
Cell-aware Stacked LSTMs for Modeling Sentences
2018-09-07
-
55
DistilBERT 66M
91.3
No
DistilBERT, a distilled version of BERT: smaller...
2019-10-02
Code
56
CNN
91.2
No
On the Role of Text Preprocessing in Neural Netw...
2017-07-06
Code
57
Suffix BiLSTM
91.2
No
Improved Sentence Modeling using Suffix Bidirect...
2018-05-18
-
58
BERT Base
91.2
No
Fine-grained Sentiment Classification using BERT
2019-10-04
Code
59
Transformer (finetune)
90.9
No
Practical Text Classification With Large Pre-Tra...
2018-12-04
Code
60
Single layer bilstm distilled from BERT
90.7
No
Distilling Task-Specific Knowledge from BERT int...
2019-03-28
Code
61
BCN+Char+CoVe
90.3
No
Learned in Translation: Contextualized Word Vect...
2017-08-01
Code
62
CNN-RNF-LSTM
90
No
Convolutional Neural Networks with Recurrent Neu...
2018-08-28
Code
63
Neural Semantic Encoder
89.7
No
Neural Semantic Encoders
2016-07-14
Code
64
BLSTM-2DCNN
89.5
No
Text Classification Improved by Integrating Bidi...
2016-11-21
Code
65
CNN + Logic rules
89.3
No
Harnessing Deep Neural Networks with Logic Rules
2016-03-21
Code
66
DMN [ankit16]
88.6
No
Ask Me Anything: Dynamic Memory Networks for Nat...
2015-06-24
Code
67
CNN-multichannel [kim2013]
88.1
No
Convolutional Neural Networks for Sentence Class...
2014-08-25
Code
68
Consistency Tree LSTM with tuned Glove vectors [tai2015improved]
88
No
Improved Semantic Representations From Tree-Stru...
2015-02-28
Code
69
C-LSTM
87.8
No
A C-LSTM Neural Network for Text Classification
2015-11-27
Code
70
MPAD-path
87.75
No
Message Passing Attention Networks for Document ...
2019-08-17
Code
71
Standard DR-AGG
87.6
No
Information Aggregation via Dynamic Routing for ...
2018-06-05
Code
72
USE_T+CNN (lrn w.e.)
87.21
No
Universal Sentence Encoder
2018-03-29
Code
73
Reverse DR-AGG
87.2
No
Information Aggregation via Dynamic Routing for ...
2018-06-05
Code
74
DC-MCNN
86.99
No
-
-
-
75
STM+TSED+PT+2L
86.95
No
The Pupil Has Become the Master: Teacher-Student...
2019-05-31
Code
76
Capsule-B
86.8
No
Investigating Capsule Networks with Dynamic Rout...
2018-03-29
Code
77
2-layer LSTM [tai2015improved]
86.3
No
Improved Semantic Representations From Tree-Stru...
2015-02-28
Code
78
SWEM-concat
84.3
No
Baseline Needs More Love: On Simple Word-Embeddi...
2018-05-24
Code
79
MV-RNN
82.9
No
-
-
Code
80
GloVe+Emo2Vec
82.3
No
Emo2Vec: Learning Generalized Emotion Representa...
2018-09-12
Code
81
Emo2Vec
81.2
No
Emo2Vec: Learning Generalized Emotion Representa...
2018-09-12
Code
82
ToWE-CBOW
78.8
No
-
-
Code
83
Joined Model Multi-tasking
54.72
No
-
-
-
#1
T5-11B
SOTA
97.5
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#2
MT-DNN-SMART
97.5
Accuracy
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#3
T5-3B
97.4
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#4
MUPPET Roberta Large
97.4
Accuracy
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#5
ALBERT
97.1
Accuracy
· 2019-09-26
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Code
#6
StructBERTRoBERTa ensemble
SOTA
97.1
Accuracy
· 2019-08-13
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
#7
XLNet (single model)
SOTA
97
Accuracy
· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Code
#8
ELECTRA
96.9
Accuracy
· 2020-03-23
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Code
#9
RoBERTa-large 355M + Entailment as Few-shot Learner
96.9
Accuracy
· 2021-04-29
Entailment as Few-Shot Learner
Code
#10
XLNet-Large (ensemble)
96.8
Accuracy
· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Code
#11
FLOATER-large
96.7
Accuracy
· 2020-03-13
Learning to Encode Position for Transformer with Continuous Dynamical Model
Code
#12
MUPPET Roberta base
96.7
Accuracy
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#13
RoBERTa (ensemble)
96.7
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#14
DeBERTa (large)
96.5
Accuracy
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#15
MT-DNN-ensemble
SOTA
96.5
Accuracy
· 2019-04-20
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Code
#16
RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
96.4
Accuracy
· 2022-08-15
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Code
#17
ASA + RoBERTa
96.3
Accuracy
· 2022-06-25
Adversarial Self-Attention for Language Understanding
Code
#18
T5-Large 770M
96.3
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#19
Snorkel MeTaL(ensemble)
SOTA
96.2
Accuracy
· 2018-10-05
Training Complex Models with Multi-Task Weak Supervision
Code
#20
PSQ (Chen et al., 2020)
96.2
Accuracy
· 2020-10-27
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
Code
#21
Heinsen Routing + RoBERTa-large
96
Accuracy
· Extra Data
· 2022-11-20
An Algorithm for Routing Vectors in Sequences
Code
#22
MT-DNN
95.6
Accuracy
· 2019-01-31
Multi-Task Deep Neural Networks for Natural Language Understanding
Code
#23
Heinsen Routing + GPT-2
95.6
Accuracy
· Extra Data
· 2019-11-02
An Algorithm for Routing Capsules in All Domains
Code
#24
T5-Base
95.2
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#25
ERNIE 2.0 Base
95
Accuracy
· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Code
#26
RoBERTa+DualCL
94.91
Accuracy
· 2022-01-21
Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation
Code
#27
BERT-LARGE
94.9
Accuracy
· 2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Code
#28
RoBERTa + SubRegWeigh (K-means)
94.84
Accuracy
· 2024-09-10
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization
Code
#29
SpanBERT
94.8
Accuracy
· 2019-07-24
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Code
#30
gMLP-large
94.8
Accuracy
· 2021-05-17
Pay Attention to MLPs
Code
#31
Q-BERT (Shen et al., 2020)
94.8
Accuracy
· 2019-09-12
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
#32
Q8BERT (Zafrir et al., 2019)
94.7
Accuracy
· 2019-10-14
Q8BERT: Quantized 8Bit BERT
Code
#33
CNN Large
94.6
Accuracy
· 2019-03-19
Cloze-driven Pretraining of Self-attention Networks
#34
BigBird
94.6
Accuracy
· 2020-07-28
Big Bird: Transformers for Longer Sequences
Code
#35
MLM+ del-word+ reorder
94.5
Accuracy
· 2020-12-31
CLEAR: Contrastive Learning for Sentence Representation
#36
ASA + BERT-base
94.1
Accuracy
· 2022-06-25
Adversarial Self-Attention for Language Understanding
Code
#37
RealFormer
94.04
Accuracy
· 2020-12-21
RealFormer: Transformer Likes Residual Attention
Code
#38
FNet-Large
94
Accuracy
· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms
Code
#39
MT-DNN
93.6
Accuracy
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#40
ERNIE
93.5
Accuracy
· 2019-05-17
ERNIE: Enhanced Language Representation with Informative Entities
Code
#41
Block-sparse LSTM
93.2
Accuracy
No paper
Code
#42
LM-CPPF RoBERTa-base
93.2
Accuracy
· 2023-05-29
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Code
#43
TinyBERT-6 67M
93.1
Accuracy
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#44
24hBERT
93
Accuracy
· 2021-04-15
How to Train BERT with an Academic Budget
Code
#45
SMART+BERT-BASE
93
Accuracy
· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Code
#46
TinyBERT-4 14.5M
92.6
Accuracy
· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding
Code
#47
bmLSTM
SOTA
91.8
Accuracy
· 2017-04-05
Learning to Generate Reviews and Discovering Sentiment
Code
#48
T5-Small
91.8
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#49
byte mLSTM7
91.7
Accuracy
· 2018-05-14
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Code
#50
PAR BERT Base
91.6
Accuracy
· 2020-09-09
Pay Attention when Required
Code
#51
Charformer-Base
91.6
Accuracy
· 2021-06-23
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Code
#52
SqueezeBERT
91.4
Accuracy
· 2020-06-19
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Code
#53
Nyströmformer
91.4
Accuracy
· 2021-02-07
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Code
#54
Bi-CAS-LSTM
91.3
Accuracy
· 2018-09-07
Cell-aware Stacked LSTMs for Modeling Sentences
#55
DistilBERT 66M
91.3
Accuracy
· 2019-10-02
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Code
#56
CNN
91.2
Accuracy
· 2017-07-06
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
Code
#57
Suffix BiLSTM
91.2
Accuracy
· 2018-05-18
Improved Sentence Modeling using Suffix Bidirectional LSTM
#58
BERT Base
91.2
Accuracy
· 2019-10-04
Fine-grained Sentiment Classification using BERT
Code
#59
Transformer (finetune)
90.9
Accuracy
· 2018-12-04
Practical Text Classification With Large Pre-Trained Language Models
Code
#60
Single layer bilstm distilled from BERT
90.7
Accuracy
· 2019-03-28
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Code
#61
BCN+Char+CoVe
90.3
Accuracy
· 2017-08-01
Learned in Translation: Contextualized Word Vectors
Code
#62
CNN-RNF-LSTM
90
Accuracy
· 2018-08-28
Convolutional Neural Networks with Recurrent Neural Filters
Code
#63
Neural Semantic Encoder
SOTA
89.7
Accuracy
· 2016-07-14
Neural Semantic Encoders
Code
#64
BLSTM-2DCNN
89.5
Accuracy
· 2016-11-21
Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling
Code
#65
CNN + Logic rules
SOTA
89.3
Accuracy
· 2016-03-21
Harnessing Deep Neural Networks with Logic Rules
Code
#66
DMN [ankit16]
SOTA
88.6
Accuracy
· 2015-06-24
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
Code
#67
CNN-multichannel [kim2013]
SOTA
88.1
Accuracy
· 2014-08-25
Convolutional Neural Networks for Sentence Classification
Code
#68
Consistency Tree LSTM with tuned Glove vectors [tai2015improved]
88
Accuracy
· 2015-02-28
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Code
#69
C-LSTM
87.8
Accuracy
· 2015-11-27
A C-LSTM Neural Network for Text Classification
Code
#70
MPAD-path
87.75
Accuracy
· 2019-08-17
Message Passing Attention Networks for Document Understanding
Code
#71
Standard DR-AGG
87.6
Accuracy
· 2018-06-05
Information Aggregation via Dynamic Routing for Sequence Encoding
Code
#72
USE_T+CNN (lrn w.e.)
87.21
Accuracy
· 2018-03-29
Universal Sentence Encoder
Code
#73
Reverse DR-AGG
87.2
Accuracy
· 2018-06-05
Information Aggregation via Dynamic Routing for Sequence Encoding
Code
#74
DC-MCNN
86.99
Accuracy
No paper
#75
STM+TSED+PT+2L
86.95
Accuracy
· 2019-05-31
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning
Code
#76
Capsule-B
86.8
Accuracy
· 2018-03-29
Investigating Capsule Networks with Dynamic Routing for Text Classification
Code
#77
2-layer LSTM [tai2015improved]
86.3
Accuracy
· 2015-02-28
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Code
#78
SWEM-concat
84.3
Accuracy
· 2018-05-24
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Code
#79
MV-RNN
82.9
Accuracy
No paper
Code
#80
GloVe+Emo2Vec
82.3
Accuracy
· 2018-09-12
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training
Code
#81
Emo2Vec
81.2
Accuracy
· 2018-09-12
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training
Code
#82
ToWE-CBOW
78.8
Accuracy
No paper
Code
#83
Joined Model Multi-tasking
54.72
Accuracy
No paper