TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/Penn Treebank (Word Level)

Language Modelling on Penn Treebank (Word Level)

Metric: Validation perplexity (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Validation perplexity▲Extra DataPaperDate↕Code
1BERT-Large-CAS36.1YesLanguage Models with Transformers2019-04-20Code
2Mogrifier LSTM + dynamic eval44.8NoMogrifier LSTM2019-09-04Code
3adversarial + AWD-LSTM-MoS + dynamic eval46.63NoImproving Neural Language Modeling via Adversari...2019-06-10Code
4GL-LWGC + AWD-MoS-LSTM + dynamic eval46.64NoGradual Learning of Recurrent Neural Networks2017-08-29Code
5FRAGE + AWD-LSTM-MoS + dynamic eval47.38NoFRAGE: Frequency-Agnostic Word Representation2018-09-18Code
6Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.48NoImproved Language Modeling by Decoding the Past2018-08-14-
7AWD-LSTM-MoS + dynamic eval48.33NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
8AWD-LSTM-DOC x548.63NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
9Ensemble of All48.92NoAdvancing State of the Art in Language Modeling2023-11-28Code
10AWD-LSTM-DRILL + dynamic eval49.5NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
11AWD-LSTM + dynamic eval51.6NoDynamic Evaluation of Neural Sequence Models2017-09-21Code
12AWD-LSTM-DOC + Partial Shuffle53.79NoPartially Shuffling the Training Data to Improve...2019-03-11Code
13AWD-LSTM + continuous cache pointer53.9NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
14AWD-LSTM-DOC54.12NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
15AWD-LSTM-MoS + Partial Shuffle55.89NoPartially Shuffling the Training Data to Improve...2019-03-11Code
16AWD-LSTM-MoS56.54NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
17Transformer-XL56.72NoTransformer-XL: Attentive Language Models Beyond...2019-01-09Code
18AWD-FWM Schlag et al. (2020)56.76NoLearning Associative Inference Using Fast Weight...2020-11-16Code
192-layer skip-LSTM + dropout tuning 57.1NoPushing the bounds of dropout2018-05-23Code
20Transformer-XL + AutoDropout58.1NoAutoDropout: Learning Dropout Patterns to Regula...2021-01-05Code
21AWD-LSTM-DRILL58.2NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
22Differentiable NAS58.3NoDARTS: Differentiable Architecture Search2018-06-24Code
23AWD-LSTM 3-layer with Fraternal dropout58.9NoFraternal Dropout2017-10-31Code
24AWD-LSTM60NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
25Efficient NAS60.8NoEfficient Neural Architecture Search via Paramet...2018-02-09Code
26Recurrent highway networks67.9NoRecurrent Highway Networks2016-07-12Code
27Inan et al. (2016) - Variational RHN68.1NoTying Word Vectors and Word Classifiers: A Loss ...2016-11-04Code
28Gal & Ghahramani (2016) - Variational LSTM (large)77.9NoA Theoretically Grounded Application of Dropout ...2015-12-16Code
29Gal & Ghahramani (2016) - Variational LSTM (medium)81.9NoA Theoretically Grounded Application of Dropout ...2015-12-16Code
30Zaremba et al. (2014) - LSTM (large)82.2NoRecurrent Neural Network Regularization2014-09-08Code
31Zaremba et al. (2014) - LSTM (medium)86.2NoRecurrent Neural Network Regularization2014-09-08Code