TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/Penn Treebank (Word Level)

Language Modelling on Penn Treebank (Word Level)

Metric: Test perplexity (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Test perplexity▲Extra DataPaperDate↕Code
1GPT-3 (Zero-Shot)20.5YesLanguage Models are Few-Shot Learners2020-05-28Code
2BERT-Large-CAS31.3YesLanguage Models with Transformers2019-04-20Code
3GPT-235.76Yes--Code
4Mogrifier LSTM + dynamic eval44.9NoMogrifier LSTM2019-09-04Code
5adversarial + AWD-LSTM-MoS + dynamic eval46.01NoImproving Neural Language Modeling via Adversari...2019-06-10Code
6GL-LWGC + AWD-MoS-LSTM + dynamic eval46.34NoGradual Learning of Recurrent Neural Networks2017-08-29Code
7FRAGE + AWD-LSTM-MoS + dynamic eval46.54NoFRAGE: Frequency-Agnostic Word Representation2018-09-18Code
8AWD-LSTM-DOC x547.17NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
9Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.47.3NoImproved Language Modeling by Decoding the Past2018-08-14-
10Ensemble of All47.31NoAdvancing State of the Art in Language Modeling2023-11-28Code
11AWD-LSTM-MoS + dynamic eval47.69NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
12AWD-LSTM-DRILL + dynamic eval49.4NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
13Dense IndRNN+dynamic eval50.97NoDeep Independently Recurrent Neural Network (Ind...2019-10-11Code
14AWD-LSTM + dynamic eval51.1NoDynamic Evaluation of Neural Sequence Models2017-09-21Code
15AWD-LSTM-DOC + Partial Shuffle52NoPartially Shuffling the Training Data to Improve...2019-03-11Code
16AWD-LSTM-DOC52.38NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
17AWD-LSTM + continuous cache pointer52.8NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
18AWD-LSTM-MoS + Partial Shuffle53.92NoPartially Shuffling the Training Data to Improve...2019-03-11Code
19Trellis Network54.19NoTrellis Networks for Sequence Modeling2018-10-15Code
20AWD-LSTM-MoS54.44NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
21AWD-FWM Schlag et al. (2020)54.48NoLearning Associative Inference Using Fast Weight...2020-11-16Code
22Transformer-XL54.55NoTransformer-XL: Attentive Language Models Beyond...2019-01-09Code
23Transformer-XL + AutoDropout54.9NoAutoDropout: Learning Dropout Patterns to Regula...2021-01-05Code
242-layer skip-LSTM + dropout tuning 55.3NoPushing the bounds of dropout2018-05-23Code
25AWD-LSTM-DRILL55.7NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
26Differentiable NAS56.1NoDARTS: Differentiable Architecture Search2018-06-24Code
27Dense IndRNN56.37NoDeep Independently Recurrent Neural Network (Ind...2019-10-11Code
28AWD-LSTM 3-layer with Fraternal dropout56.8NoFraternal Dropout2017-10-31Code
29DEQ-TrellisNet57.1NoDeep Equilibrium Models2019-09-03Code
30AWD-LSTM57.3NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
31Efficient NAS58.6NoEfficient Neural Architecture Search via Paramet...2018-02-09Code
32NAS-RL64NoNeural Architecture Search with Reinforcement Le...2016-11-05Code
33Recurrent highway networks65.4NoRecurrent Highway Networks2016-07-12Code
34Inan et al. (2016) - Variational RHN66NoTying Word Vectors and Word Classifiers: A Loss ...2016-11-04Code
35Gal & Ghahramani (2016) - Variational LSTM (large)75.2NoA Theoretically Grounded Application of Dropout ...2015-12-16Code
36Zaremba et al. (2014) - LSTM (large)78.4NoRecurrent Neural Network Regularization2014-09-08Code
37LSTM (Bai et al., 2018)78.93NoAn Empirical Evaluation of Generic Convolutional...2018-03-04Code
38Gal & Ghahramani (2016) - Variational LSTM (medium)79.7NoA Theoretically Grounded Application of Dropout ...2015-12-16Code
39Zaremba et al. (2014) - LSTM (medium)82.7NoRecurrent Neural Network Regularization2014-09-08Code
40R-Transformer84.38NoR-Transformer: Recurrent Neural Network Enhanced...2019-07-12Code
41GRU (Bai et al., 2018)92.48NoAn Empirical Evaluation of Generic Convolutional...2018-03-04Code
42Seq-U-Net107.95NoSeq-U-Net: A One-Dimensional Causal U-Net for Ef...2019-11-14Code
43TCN108.47NoSeq-U-Net: A One-Dimensional Causal U-Net for Ef...2019-11-14Code