| 1 | XLNet+DSC | 95.77 | Yes | Dice Loss for Data-imbalanced NLP Tasks | 2019-11-07 | Code |
| 2 | T5-11B | 95.64 | Yes | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 3 | XLNet (single model) | 95.1 | Yes | XLNet: Generalized Autoregressive Pretraining fo... | 2019-06-19 | Code |
| 4 | LUKE 483M | 95 | No | LUKE: Deep Contextualized Entity Representations... | 2020-10-02 | Code |
| 5 | T5-3B | 94.95 | Yes | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 6 | T5-Large 770M | 93.79 | No | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 7 | BERT-LARGE (Ensemble+TriviaQA) | 92.2 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 8 | T5-Base | 92.08 | Yes | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 9 | BERT-LARGE (Single+TriviaQA) | 91.1 | No | BERT: Pre-training of Deep Bidirectional Transfo... | 2018-10-11 | Code |
| 10 | BART Base (with text infilling) | 90.8 | No | BART: Denoising Sequence-to-Sequence Pre-trainin... | 2019-10-29 | Code |
| 11 | BERT large (LAMB optimizer) | 90.584 | No | Large Batch Optimization for Deep Learning: Trai... | 2019-04-01 | Code |
| 12 | BERT-Large-uncased-PruneOFA (90% unstruct sparse) | 90.2 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 13 | BERT-Large-uncased-PruneOFA (90% unstruct sparse, QAT Int8) | 90.02 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 14 | BERT-Base-uncased-PruneOFA (85% unstruct sparse) | 88.42 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 15 | BERT-Base-uncased-PruneOFA (85% unstruct sparse, QAT Int8) | 88.24 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 16 | TinyBERT-6 67M | 87.5 | No | TinyBERT: Distilling BERT for Natural Language U... | 2019-09-23 | Code |
| 17 | BERT-Base-uncased-PruneOFA (90% unstruct sparse) | 87.25 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 18 | T5-Small | 87.24 | Yes | Exploring the Limits of Transfer Learning with a... | 2019-10-23 | Code |
| 19 | R.M-Reader (single) | 86.3 | No | Reinforced Mnemonic Reader for Machine Reading C... | 2017-05-08 | Code |
| 20 | DensePhrases | 86.3 | No | Learning Dense Representations of Phrases at Scale | 2020-12-23 | Code |
| 21 | DistilBERT-uncased-PruneOFA (85% unstruct sparse) | 85.82 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 22 | DistilBERT 66M | 85.8 | No | DistilBERT, a distilled version of BERT: smaller... | 2019-10-02 | Code |
| 23 | BiDAF + Self Attention + ELMo | 85.6 | No | Deep contextualized word representations | 2018-02-15 | Code |
| 24 | DistilBERT-uncased-PruneOFA (85% unstruct sparse, QAT Int8) | 85.13 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 25 | KAR | 84.9 | No | Explicit Utilization of General Knowledge in Mac... | 2018-09-10 | - |
| 26 | DistilBERT-uncased-PruneOFA (90% unstruct sparse) | 84.82 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 27 | SAN (single) | 84.056 | No | Stochastic Answer Networks for Machine Reading C... | 2017-12-10 | Code |
| 28 | DistilBERT-uncased-PruneOFA (90% unstruct sparse, QAT Int8) | 83.87 | No | Prune Once for All: Sparse Pre-Trained Language ... | 2021-11-10 | Code |
| 29 | QANet (data aug x3) | 83.8 | No | QANet: Combining Local Convolution with Global S... | 2018-04-23 | Code |
| 30 | FusionNet | 83.6 | No | FusionNet: Fusing via Fully-Aware Attention with... | 2017-11-16 | Code |
| 31 | QANet (data aug x2) | 83.2 | No | QANet: Combining Local Convolution with Global S... | 2018-04-23 | Code |
| 32 | DCN+ (single) | 83.1 | No | DCN+: Mixed Objective and Deep Residual Coattent... | 2017-10-31 | Code |
| 33 | QANet | 82.7 | No | QANet: Combining Local Convolution with Global S... | 2018-04-23 | Code |
| 34 | PhaseCond (single) | 81.4 | No | Phase Conductor on Multi-layered Attentions for ... | 2017-10-28 | - |
| 35 | SRU | 80.2 | No | Simple Recurrent Units for Highly Parallelizable... | 2017-09-08 | Code |
| 36 | Smarnet | 80.183 | No | Smarnet: Teaching Machines to Read and Comprehen... | 2017-10-08 | - |
| 37 | DCN (Char + CoVe) | 79.9 | No | Learned in Translation: Contextualized Word Vect... | 2017-08-01 | Code |
| 38 | R-NET (single) | 79.5 | No | - | - | - |
| 39 | Ruminating Reader | 79.5 | No | Ruminating Reader: Reasoning with Gated Multi-Ho... | 2017-04-24 | - |
| 40 | DrQA (Document Reader only) | 78.8 | No | Reading Wikipedia to Answer Open-Domain Questions | 2017-03-31 | Code |
| 41 | FastQAExt (beam-size 5) | 78.5 | No | Making Neural QA as Simple as Possible but not S... | 2017-03-14 | Code |
| 42 | jNet (TreeLSTM adaptation, QTLa, K=100) | 78.38 | No | Exploring Question Understanding and Adaptation ... | 2017-03-14 | - |
| 43 | SEDT-LSTM | 77.42 | No | Structural Embedding of Syntactic Trees for Mach... | 2017-03-02 | - |
| 44 | BIDAF (single) | 77.3 | No | Bidirectional Attention Flow for Machine Compreh... | 2016-11-05 | Code |
| 45 | SECT-LSTM | 77.19 | No | Structural Embedding of Syntactic Trees for Mach... | 2017-03-02 | - |
| 46 | MPCM | 75.8 | No | Multi-Perspective Context Matching for Machine C... | 2016-12-13 | Code |
| 47 | DCN | 75.6 | No | Dynamic Coattention Networks For Question Answer... | 2016-11-05 | Code |
| 48 | FABIR | 75.6 | No | A Fully Attention-Based Information Retriever | 2018-10-22 | Code |
| 49 | RASOR | 74.9 | No | Learning Recurrent Span Representations for Extr... | 2016-11-04 | Code |
| 50 | FG fine-grained gate | 71.25 | No | Words or Characters? Fine-grained Gating for Rea... | 2016-11-06 | Code |
| 51 | DCR | 71.2 | No | End-to-End Answer Chunk Extraction and Ranking f... | 2016-10-31 | - |
| 52 | Match-LSTM with Bi-Ans-Ptr (Boundary+Search+b) | 64.7 | No | Machine Comprehension Using Match-LSTM and Answe... | 2016-08-29 | Code |