| 1 | T5-11B | 97.5 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 2 | MT-DNN-SMART | 97.5 | No | SMART: Robust and Efficient Fine-Tuning for Pre-... | 2019-11-08 | Code |
| 3 | T5-3B | 97.4 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 4 | MUPPET Roberta Large | 97.4 | No | Muppet: Massive Multi-task Representations with ... | 2021-01-26 | Code |
| 5 | ALBERT | 97.1 | No | ALBERT: A Lite BERT for Self-supervised Learning... | 2019-09-26 | Code |
| 6 | StructBERTRoBERTa ensemble | 97.1 | No | StructBERT: Incorporating Language Structures in... | 2019-08-13 | - |
| 7 | XLNet (single model) | 97 | No | XLNet: Generalized Autoregressive Pretraining fo... | 2019-06-19 | Code |
| 8 | ELECTRA | 96.9 | No | ELECTRA: Pre-training Text Encoders as Discrimin... | 2020-03-23 | Code |
| 9 | RoBERTa-large 355M + Entailment as Few-shot Learner | 96.9 | No | Entailment as Few-Shot Learner | 2021-04-29 | Code |
| 10 | XLNet-Large (ensemble) | 96.8 | No | XLNet: Generalized Autoregressive Pretraining fo... | 2019-06-19 | Code |
| 11 | FLOATER-large | 96.7 | No | Learning to Encode Position for Transformer with... | 2020-03-13 | Code |
| 12 | MUPPET Roberta base | 96.7 | No | Muppet: Massive Multi-task Representations with ... | 2021-01-26 | Code |
| 13 | RoBERTa (ensemble) | 96.7 | No | RoBERTa: A Robustly Optimized BERT Pretraining A... | 2019-07-26 | Code |
| 14 | DeBERTa (large) | 96.5 | No | DeBERTa: Decoding-enhanced BERT with Disentangle... | 2020-06-05 | Code |
| 15 | MT-DNN-ensemble | 96.5 | No | Improving Multi-Task Deep Neural Networks via Kn... | 2019-04-20 | Code |
| 16 | RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned) | 96.4 | No | LLM.int8(): 8-bit Matrix Multiplication for Tran... | 2022-08-15 | Code |
| 17 | ASA + RoBERTa | 96.3 | No | Adversarial Self-Attention for Language Understa... | 2022-06-25 | Code |
| 18 | T5-Large 770M | 96.3 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 19 | Snorkel MeTaL(ensemble) | 96.2 | No | Training Complex Models with Multi-Task Weak Sup... | 2018-10-05 | Code |
| 20 | PSQ (Chen et al., 2020) | 96.2 | No | A Statistical Framework for Low-bitwidth Trainin... | 2020-10-27 | Code |
| 21 | Heinsen Routing + RoBERTa-large | 96 | Yes | An Algorithm for Routing Vectors in Sequences | 2022-11-20 | Code |
| 22 | MT-DNN | 95.6 | No | Multi-Task Deep Neural Networks for Natural Lang... | 2019-01-31 | Code |
| 23 | Heinsen Routing + GPT-2 | 95.6 | Yes | An Algorithm for Routing Capsules in All Domains | 2019-11-02 | Code |
| 24 | T5-Base | 95.2 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 25 | ERNIE 2.0 Base | 95 | No | ERNIE 2.0: A Continual Pre-training Framework fo... | 2019-07-29 | Code |
| 26 | RoBERTa+DualCL | 94.91 | No | Dual Contrastive Learning: Text Classification v... | 2022-01-21 | Code |
| 27 | BERT-LARGE | 94.9 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 28 | RoBERTa + SubRegWeigh (K-means) | 94.84 | No | SubRegWeigh: Effective and Efficient Annotation ... | 2024-09-10 | Code |
| 29 | SpanBERT | 94.8 | No | SpanBERT: Improving Pre-training by Representing... | 2019-07-24 | Code |
| 30 | gMLP-large | 94.8 | No | Pay Attention to MLPs | 2021-05-17 | Code |
| 31 | Q-BERT (Shen et al., 2020) | 94.8 | No | Q-BERT: Hessian Based Ultra Low Precision Quanti... | 2019-09-12 | - |
| 32 | Q8BERT (Zafrir et al., 2019) | 94.7 | No | Q8BERT: Quantized 8Bit BERT | 2019-10-14 | Code |
| 33 | CNN Large | 94.6 | No | Cloze-driven Pretraining of Self-attention Netwo... | 2019-03-19 | - |
| 34 | BigBird | 94.6 | No | Big Bird: Transformers for Longer Sequences | 2020-07-28 | Code |
| 35 | MLM+ del-word+ reorder | 94.5 | No | CLEAR: Contrastive Learning for Sentence Represe... | 2020-12-31 | - |
| 36 | ASA + BERT-base | 94.1 | No | Adversarial Self-Attention for Language Understa... | 2022-06-25 | Code |
| 37 | RealFormer | 94.04 | No | RealFormer: Transformer Likes Residual Attention | 2020-12-21 | Code |
| 38 | FNet-Large | 94 | No | FNet: Mixing Tokens with Fourier Transforms | 2021-05-09 | Code |
| 39 | MT-DNN | 93.6 | No | SMART: Robust and Efficient Fine-Tuning for Pre-... | 2019-11-08 | Code |
| 40 | ERNIE | 93.5 | No | ERNIE: Enhanced Language Representation with Inf... | 2019-05-17 | Code |
| 41 | Block-sparse LSTM | 93.2 | No | - | - | Code |
| 42 | LM-CPPF RoBERTa-base | 93.2 | No | LM-CPPF: Paraphrasing-Guided Data Augmentation f... | 2023-05-29 | Code |
| 43 | TinyBERT-6 67M | 93.1 | No | TinyBERT: Distilling BERT for Natural Language U... | 2019-09-23 | Code |
| 44 | 24hBERT | 93 | No | How to Train BERT with an Academic Budget | 2021-04-15 | Code |
| 45 | SMART+BERT-BASE | 93 | No | SMART: Robust and Efficient Fine-Tuning for Pre-... | 2019-11-08 | Code |
| 46 | TinyBERT-4 14.5M | 92.6 | No | TinyBERT: Distilling BERT for Natural Language U... | 2019-09-23 | Code |
| 47 | bmLSTM | 91.8 | No | Learning to Generate Reviews and Discovering Sen... | 2017-04-05 | Code |
| 48 | T5-Small | 91.8 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 49 | byte mLSTM7 | 91.7 | No | A La Carte Embedding: Cheap but Effective Induct... | 2018-05-14 | Code |
| 50 | PAR BERT Base | 91.6 | No | Pay Attention when Required | 2020-09-09 | Code |
| 51 | Charformer-Base | 91.6 | No | Charformer: Fast Character Transformers via Grad... | 2021-06-23 | Code |
| 52 | SqueezeBERT | 91.4 | No | SqueezeBERT: What can computer vision teach NLP ... | 2020-06-19 | Code |
| 53 | Nyströmformer | 91.4 | No | Nyströmformer: A Nyström-Based Algorithm for App... | 2021-02-07 | Code |
| 54 | Bi-CAS-LSTM | 91.3 | No | Cell-aware Stacked LSTMs for Modeling Sentences | 2018-09-07 | - |
| 55 | DistilBERT 66M | 91.3 | No | DistilBERT, a distilled version of BERT: smaller... | 2019-10-02 | Code |
| 56 | CNN | 91.2 | No | On the Role of Text Preprocessing in Neural Netw... | 2017-07-06 | Code |
| 57 | Suffix BiLSTM | 91.2 | No | Improved Sentence Modeling using Suffix Bidirect... | 2018-05-18 | - |
| 58 | BERT Base | 91.2 | No | Fine-grained Sentiment Classification using BERT | 2019-10-04 | Code |
| 59 | Transformer (finetune) | 90.9 | No | Practical Text Classification With Large Pre-Tra... | 2018-12-04 | Code |
| 60 | Single layer bilstm distilled from BERT | 90.7 | No | Distilling Task-Specific Knowledge from BERT int... | 2019-03-28 | Code |
| 61 | BCN+Char+CoVe | 90.3 | No | Learned in Translation: Contextualized Word Vect... | 2017-08-01 | Code |
| 62 | CNN-RNF-LSTM | 90 | No | Convolutional Neural Networks with Recurrent Neu... | 2018-08-28 | Code |
| 63 | Neural Semantic Encoder | 89.7 | No | Neural Semantic Encoders | 2016-07-14 | Code |
| 64 | BLSTM-2DCNN | 89.5 | No | Text Classification Improved by Integrating Bidi... | 2016-11-21 | Code |
| 65 | CNN + Logic rules | 89.3 | No | Harnessing Deep Neural Networks with Logic Rules | 2016-03-21 | Code |
| 66 | DMN [ankit16] | 88.6 | No | Ask Me Anything: Dynamic Memory Networks for Nat... | 2015-06-24 | Code |
| 67 | CNN-multichannel [kim2013] | 88.1 | No | Convolutional Neural Networks for Sentence Class... | 2014-08-25 | Code |
| 68 | Consistency Tree LSTM with tuned Glove vectors [tai2015improved] | 88 | No | Improved Semantic Representations From Tree-Stru... | 2015-02-28 | Code |
| 69 | C-LSTM | 87.8 | No | A C-LSTM Neural Network for Text Classification | 2015-11-27 | Code |
| 70 | MPAD-path | 87.75 | No | Message Passing Attention Networks for Document ... | 2019-08-17 | Code |
| 71 | Standard DR-AGG | 87.6 | No | Information Aggregation via Dynamic Routing for ... | 2018-06-05 | Code |
| 72 | USE_T+CNN (lrn w.e.) | 87.21 | No | Universal Sentence Encoder | 2018-03-29 | Code |
| 73 | Reverse DR-AGG | 87.2 | No | Information Aggregation via Dynamic Routing for ... | 2018-06-05 | Code |
| 74 | DC-MCNN | 86.99 | No | - | - | - |
| 75 | STM+TSED+PT+2L | 86.95 | No | The Pupil Has Become the Master: Teacher-Student... | 2019-05-31 | Code |
| 76 | Capsule-B | 86.8 | No | Investigating Capsule Networks with Dynamic Rout... | 2018-03-29 | Code |
| 77 | 2-layer LSTM [tai2015improved] | 86.3 | No | Improved Semantic Representations From Tree-Stru... | 2015-02-28 | Code |
| 78 | SWEM-concat | 84.3 | No | Baseline Needs More Love: On Simple Word-Embeddi... | 2018-05-24 | Code |
| 79 | MV-RNN | 82.9 | No | - | - | Code |
| 80 | GloVe+Emo2Vec | 82.3 | No | Emo2Vec: Learning Generalized Emotion Representa... | 2018-09-12 | Code |
| 81 | Emo2Vec | 81.2 | No | Emo2Vec: Learning Generalized Emotion Representa... | 2018-09-12 | Code |
| 82 | ToWE-CBOW | 78.8 | No | - | - | Code |
| 83 | Joined Model Multi-tasking | 54.72 | No | - | - | - |