Sentiment Analysis on SST-2 Binary classification

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	T5-11B	97.5	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
2	MT-DNN-SMART	97.5	No	SMART: Robust and Efficient Fine-Tuning for Pre-...	2019-11-08	Code
3	T5-3B	97.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
4	MUPPET Roberta Large	97.4	No	Muppet: Massive Multi-task Representations with ...	2021-01-26	Code
5	ALBERT	97.1	No	ALBERT: A Lite BERT for Self-supervised Learning...	2019-09-26	Code
6	StructBERTRoBERTa ensemble	97.1	No	StructBERT: Incorporating Language Structures in...	2019-08-13	-
7	XLNet (single model)	97	No	XLNet: Generalized Autoregressive Pretraining fo...	2019-06-19	Code
8	ELECTRA	96.9	No	ELECTRA: Pre-training Text Encoders as Discrimin...	2020-03-23	Code
9	RoBERTa-large 355M + Entailment as Few-shot Learner	96.9	No	Entailment as Few-Shot Learner	2021-04-29	Code
10	XLNet-Large (ensemble)	96.8	No	XLNet: Generalized Autoregressive Pretraining fo...	2019-06-19	Code
11	FLOATER-large	96.7	No	Learning to Encode Position for Transformer with...	2020-03-13	Code
12	MUPPET Roberta base	96.7	No	Muppet: Massive Multi-task Representations with ...	2021-01-26	Code
13	RoBERTa (ensemble)	96.7	No	RoBERTa: A Robustly Optimized BERT Pretraining A...	2019-07-26	Code
14	DeBERTa (large)	96.5	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
15	MT-DNN-ensemble	96.5	No	Improving Multi-Task Deep Neural Networks via Kn...	2019-04-20	Code
16	RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)	96.4	No	LLM.int8(): 8-bit Matrix Multiplication for Tran...	2022-08-15	Code
17	ASA + RoBERTa	96.3	No	Adversarial Self-Attention for Language Understa...	2022-06-25	Code
18	T5-Large 770M	96.3	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
19	Snorkel MeTaL(ensemble)	96.2	No	Training Complex Models with Multi-Task Weak Sup...	2018-10-05	Code
20	PSQ (Chen et al., 2020)	96.2	No	A Statistical Framework for Low-bitwidth Trainin...	2020-10-27	Code
21	Heinsen Routing + RoBERTa-large	96	Yes	An Algorithm for Routing Vectors in Sequences	2022-11-20	Code
22	MT-DNN	95.6	No	Multi-Task Deep Neural Networks for Natural Lang...	2019-01-31	Code
23	Heinsen Routing + GPT-2	95.6	Yes	An Algorithm for Routing Capsules in All Domains	2019-11-02	Code
24	T5-Base	95.2	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
25	ERNIE 2.0 Base	95	No	ERNIE 2.0: A Continual Pre-training Framework fo...	2019-07-29	Code
26	RoBERTa+DualCL	94.91	No	Dual Contrastive Learning: Text Classification v...	2022-01-21	Code
27	BERT-LARGE	94.9	No	BERT: Pre-training of Deep Bidirectional Transfo...	2018-10-11	Code
28	RoBERTa + SubRegWeigh (K-means)	94.84	No	SubRegWeigh: Effective and Efficient Annotation ...	2024-09-10	Code
29	SpanBERT	94.8	No	SpanBERT: Improving Pre-training by Representing...	2019-07-24	Code
30	gMLP-large	94.8	No	Pay Attention to MLPs	2021-05-17	Code
31	Q-BERT (Shen et al., 2020)	94.8	No	Q-BERT: Hessian Based Ultra Low Precision Quanti...	2019-09-12	-
32	Q8BERT (Zafrir et al., 2019)	94.7	No	Q8BERT: Quantized 8Bit BERT	2019-10-14	Code
33	CNN Large	94.6	No	Cloze-driven Pretraining of Self-attention Netwo...	2019-03-19	-
34	BigBird	94.6	No	Big Bird: Transformers for Longer Sequences	2020-07-28	Code
35	MLM+ del-word+ reorder	94.5	No	CLEAR: Contrastive Learning for Sentence Represe...	2020-12-31	-
36	ASA + BERT-base	94.1	No	Adversarial Self-Attention for Language Understa...	2022-06-25	Code
37	RealFormer	94.04	No	RealFormer: Transformer Likes Residual Attention	2020-12-21	Code
38	FNet-Large	94	No	FNet: Mixing Tokens with Fourier Transforms	2021-05-09	Code
39	MT-DNN	93.6	No	SMART: Robust and Efficient Fine-Tuning for Pre-...	2019-11-08	Code
40	ERNIE	93.5	No	ERNIE: Enhanced Language Representation with Inf...	2019-05-17	Code
41	Block-sparse LSTM	93.2	No	-	-	Code
42	LM-CPPF RoBERTa-base	93.2	No	LM-CPPF: Paraphrasing-Guided Data Augmentation f...	2023-05-29	Code
43	TinyBERT-6 67M	93.1	No	TinyBERT: Distilling BERT for Natural Language U...	2019-09-23	Code
44	24hBERT	93	No	How to Train BERT with an Academic Budget	2021-04-15	Code
45	SMART+BERT-BASE	93	No	SMART: Robust and Efficient Fine-Tuning for Pre-...	2019-11-08	Code
46	TinyBERT-4 14.5M	92.6	No	TinyBERT: Distilling BERT for Natural Language U...	2019-09-23	Code
47	bmLSTM	91.8	No	Learning to Generate Reviews and Discovering Sen...	2017-04-05	Code
48	T5-Small	91.8	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
49	byte mLSTM7	91.7	No	A La Carte Embedding: Cheap but Effective Induct...	2018-05-14	Code
50	PAR BERT Base	91.6	No	Pay Attention when Required	2020-09-09	Code
51	Charformer-Base	91.6	No	Charformer: Fast Character Transformers via Grad...	2021-06-23	Code
52	SqueezeBERT	91.4	No	SqueezeBERT: What can computer vision teach NLP ...	2020-06-19	Code
53	Nyströmformer	91.4	No	Nyströmformer: A Nyström-Based Algorithm for App...	2021-02-07	Code
54	Bi-CAS-LSTM	91.3	No	Cell-aware Stacked LSTMs for Modeling Sentences	2018-09-07	-
55	DistilBERT 66M	91.3	No	DistilBERT, a distilled version of BERT: smaller...	2019-10-02	Code
56	CNN	91.2	No	On the Role of Text Preprocessing in Neural Netw...	2017-07-06	Code
57	Suffix BiLSTM	91.2	No	Improved Sentence Modeling using Suffix Bidirect...	2018-05-18	-
58	BERT Base	91.2	No	Fine-grained Sentiment Classification using BERT	2019-10-04	Code
59	Transformer (finetune)	90.9	No	Practical Text Classification With Large Pre-Tra...	2018-12-04	Code
60	Single layer bilstm distilled from BERT	90.7	No	Distilling Task-Specific Knowledge from BERT int...	2019-03-28	Code
61	BCN+Char+CoVe	90.3	No	Learned in Translation: Contextualized Word Vect...	2017-08-01	Code
62	CNN-RNF-LSTM	90	No	Convolutional Neural Networks with Recurrent Neu...	2018-08-28	Code
63	Neural Semantic Encoder	89.7	No	Neural Semantic Encoders	2016-07-14	Code
64	BLSTM-2DCNN	89.5	No	Text Classification Improved by Integrating Bidi...	2016-11-21	Code
65	CNN + Logic rules	89.3	No	Harnessing Deep Neural Networks with Logic Rules	2016-03-21	Code
66	DMN [ankit16]	88.6	No	Ask Me Anything: Dynamic Memory Networks for Nat...	2015-06-24	Code
67	CNN-multichannel [kim2013]	88.1	No	Convolutional Neural Networks for Sentence Class...	2014-08-25	Code
68	Consistency Tree LSTM with tuned Glove vectors [tai2015improved]	88	No	Improved Semantic Representations From Tree-Stru...	2015-02-28	Code
69	C-LSTM	87.8	No	A C-LSTM Neural Network for Text Classification	2015-11-27	Code
70	MPAD-path	87.75	No	Message Passing Attention Networks for Document ...	2019-08-17	Code
71	Standard DR-AGG	87.6	No	Information Aggregation via Dynamic Routing for ...	2018-06-05	Code
72	USE_T+CNN (lrn w.e.)	87.21	No	Universal Sentence Encoder	2018-03-29	Code
73	Reverse DR-AGG	87.2	No	Information Aggregation via Dynamic Routing for ...	2018-06-05	Code
74	DC-MCNN	86.99	No	-	-	-
75	STM+TSED+PT+2L	86.95	No	The Pupil Has Become the Master: Teacher-Student...	2019-05-31	Code
76	Capsule-B	86.8	No	Investigating Capsule Networks with Dynamic Rout...	2018-03-29	Code
77	2-layer LSTM [tai2015improved]	86.3	No	Improved Semantic Representations From Tree-Stru...	2015-02-28	Code
78	SWEM-concat	84.3	No	Baseline Needs More Love: On Simple Word-Embeddi...	2018-05-24	Code
79	MV-RNN	82.9	No	-	-	Code
80	GloVe+Emo2Vec	82.3	No	Emo2Vec: Learning Generalized Emotion Representa...	2018-09-12	Code
81	Emo2Vec	81.2	No	Emo2Vec: Learning Generalized Emotion Representa...	2018-09-12	Code
82	ToWE-CBOW	78.8	No	-	-	Code
83	Joined Model Multi-tasking	54.72	No	-	-	-

#1T5-11BSOTA
97.5
Accuracy· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Code
#2MT-DNN-SMART
97.5
Accuracy· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Code
#3T5-3B
97.4
Accuracy· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Code
#4MUPPET Roberta Large
97.4
Accuracy· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning Code
#5ALBERT
97.1
Accuracy· 2019-09-26
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Code
#6StructBERTRoBERTa ensembleSOTA
97.1
Accuracy· 2019-08-13
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
#7XLNet (single model)SOTA
97
Accuracy· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding Code
#8ELECTRA
96.9
Accuracy· 2020-03-23
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Code
#9RoBERTa-large 355M + Entailment as Few-shot Learner
96.9
Accuracy· 2021-04-29
Entailment as Few-Shot Learner Code
#10XLNet-Large (ensemble)
96.8
Accuracy· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding Code
#11FLOATER-large
96.7
Accuracy· 2020-03-13
Learning to Encode Position for Transformer with Continuous Dynamical Model Code
#12MUPPET Roberta base
96.7
Accuracy· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning Code
#13RoBERTa (ensemble)
96.7
Accuracy· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach Code
#14DeBERTa (large)
96.5
Accuracy· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention Code
#15MT-DNN-ensembleSOTA
96.5
Accuracy· 2019-04-20
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding Code
#16RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)
96.4
Accuracy· 2022-08-15
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Code
#17ASA + RoBERTa
96.3
Accuracy· 2022-06-25
Adversarial Self-Attention for Language Understanding Code
#18T5-Large 770M
96.3
Accuracy· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Code
#19Snorkel MeTaL(ensemble)SOTA
96.2
Accuracy· 2018-10-05
Training Complex Models with Multi-Task Weak Supervision Code
#20PSQ (Chen et al., 2020)
96.2
Accuracy· 2020-10-27
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks Code
#21Heinsen Routing + RoBERTa-large
96
Accuracy· Extra Data· 2022-11-20
An Algorithm for Routing Vectors in Sequences Code
#22MT-DNN
95.6
Accuracy· 2019-01-31
Multi-Task Deep Neural Networks for Natural Language Understanding Code
#23Heinsen Routing + GPT-2
95.6
Accuracy· Extra Data· 2019-11-02
An Algorithm for Routing Capsules in All Domains Code
#24T5-Base
95.2
Accuracy· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Code
#25ERNIE 2.0 Base
95
Accuracy· 2019-07-29
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding Code
#26RoBERTa+DualCL
94.91
Accuracy· 2022-01-21
Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation Code
#27BERT-LARGE
94.9
Accuracy· 2018-10-11
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Code
#28RoBERTa + SubRegWeigh (K-means)
94.84
Accuracy· 2024-09-10
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization Code
#29SpanBERT
94.8
Accuracy· 2019-07-24
SpanBERT: Improving Pre-training by Representing and Predicting Spans Code
#30gMLP-large
94.8
Accuracy· 2021-05-17
Pay Attention to MLPs Code
#31Q-BERT (Shen et al., 2020)
94.8
Accuracy· 2019-09-12
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
#32Q8BERT (Zafrir et al., 2019)
94.7
Accuracy· 2019-10-14
Q8BERT: Quantized 8Bit BERT Code
#33CNN Large
94.6
Accuracy· 2019-03-19
Cloze-driven Pretraining of Self-attention Networks
#34BigBird
94.6
Accuracy· 2020-07-28
Big Bird: Transformers for Longer Sequences Code
#35MLM+ del-word+ reorder
94.5
Accuracy· 2020-12-31
CLEAR: Contrastive Learning for Sentence Representation
#36ASA + BERT-base
94.1
Accuracy· 2022-06-25
Adversarial Self-Attention for Language Understanding Code
#37RealFormer
94.04
Accuracy· 2020-12-21
RealFormer: Transformer Likes Residual Attention Code
#38FNet-Large
94
Accuracy· 2021-05-09
FNet: Mixing Tokens with Fourier Transforms Code
#39MT-DNN
93.6
Accuracy· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Code
#40ERNIE
93.5
Accuracy· 2019-05-17
ERNIE: Enhanced Language Representation with Informative Entities Code
#41Block-sparse LSTM
93.2
Accuracy
No paperCode
#42LM-CPPF RoBERTa-base
93.2
Accuracy· 2023-05-29
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning Code
#43TinyBERT-6 67M
93.1
Accuracy· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding Code
#4424hBERT
93
Accuracy· 2021-04-15
How to Train BERT with an Academic Budget Code
#45SMART+BERT-BASE
93
Accuracy· 2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Code
#46TinyBERT-4 14.5M
92.6
Accuracy· 2019-09-23
TinyBERT: Distilling BERT for Natural Language Understanding Code
#47bmLSTMSOTA
91.8
Accuracy· 2017-04-05
Learning to Generate Reviews and Discovering Sentiment Code
#48T5-Small
91.8
Accuracy· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Code
#49byte mLSTM7
91.7
Accuracy· 2018-05-14
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors Code
#50PAR BERT Base
91.6
Accuracy· 2020-09-09
Pay Attention when Required Code
#51Charformer-Base
91.6
Accuracy· 2021-06-23
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization Code
#52SqueezeBERT
91.4
Accuracy· 2020-06-19
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?Code
#53Nyströmformer
91.4
Accuracy· 2021-02-07
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention Code
#54Bi-CAS-LSTM
91.3
Accuracy· 2018-09-07
Cell-aware Stacked LSTMs for Modeling Sentences
#55DistilBERT 66M
91.3
Accuracy· 2019-10-02
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Code
#56CNN
91.2
Accuracy· 2017-07-06
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis Code
#57Suffix BiLSTM
91.2
Accuracy· 2018-05-18
Improved Sentence Modeling using Suffix Bidirectional LSTM
#58BERT Base
91.2
Accuracy· 2019-10-04
Fine-grained Sentiment Classification using BERT Code
#59Transformer (finetune)
90.9
Accuracy· 2018-12-04
Practical Text Classification With Large Pre-Trained Language Models Code
#60Single layer bilstm distilled from BERT
90.7
Accuracy· 2019-03-28
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Code
#61BCN+Char+CoVe
90.3
Accuracy· 2017-08-01
Learned in Translation: Contextualized Word Vectors Code
#62CNN-RNF-LSTM
90
Accuracy· 2018-08-28
Convolutional Neural Networks with Recurrent Neural Filters Code
#63Neural Semantic EncoderSOTA
89.7
Accuracy· 2016-07-14
Neural Semantic Encoders Code
#64BLSTM-2DCNN
89.5
Accuracy· 2016-11-21
Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling Code
#65CNN + Logic rulesSOTA
89.3
Accuracy· 2016-03-21
Harnessing Deep Neural Networks with Logic Rules Code
#66DMN [ankit16]SOTA
88.6
Accuracy· 2015-06-24
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Code
#67CNN-multichannel [kim2013]SOTA
88.1
Accuracy· 2014-08-25
Convolutional Neural Networks for Sentence Classification Code
#68Consistency Tree LSTM with tuned Glove vectors [tai2015improved]
88
Accuracy· 2015-02-28
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks Code
#69C-LSTM
87.8
Accuracy· 2015-11-27
A C-LSTM Neural Network for Text Classification Code
#70MPAD-path
87.75
Accuracy· 2019-08-17
Message Passing Attention Networks for Document Understanding Code
#71Standard DR-AGG
87.6
Accuracy· 2018-06-05
Information Aggregation via Dynamic Routing for Sequence Encoding Code
#72USE_T+CNN (lrn w.e.)
87.21
Accuracy· 2018-03-29
Universal Sentence Encoder Code
#73Reverse DR-AGG
87.2
Accuracy· 2018-06-05
Information Aggregation via Dynamic Routing for Sequence Encoding Code
#74DC-MCNN
86.99
Accuracy
No paper
#75STM+TSED+PT+2L
86.95
Accuracy· 2019-05-31
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning Code
#76Capsule-B
86.8
Accuracy· 2018-03-29
Investigating Capsule Networks with Dynamic Routing for Text Classification Code
#772-layer LSTM [tai2015improved]
86.3
Accuracy· 2015-02-28
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks Code
#78SWEM-concat
84.3
Accuracy· 2018-05-24
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms Code
#79MV-RNN
82.9
Accuracy
No paperCode
#80GloVe+Emo2Vec
82.3
Accuracy· 2018-09-12
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training Code
#81Emo2Vec
81.2
Accuracy· 2018-09-12
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training Code
#82ToWE-CBOW
78.8
Accuracy
No paperCode
#83Joined Model Multi-tasking
54.72
Accuracy
No paper