TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/WikiText-2

Language Modelling on WikiText-2

Metric: Validation perplexity (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Validation perplexity▲Extra DataPaperDate↕Code
1GPT-2 (fine-tuned)15.69YesHydra: A System for Large Multi-Model Deep Learn...2021-10-16Code
2BERT-Large-CAS37.7YesLanguage Models with Transformers2019-04-20Code
3Mogrifier LSTM + dynamic eval40.2NoMogrifier LSTM2019-09-04Code
4adversarial + AWD-LSTM-MoS + dynamic eval40.27NoImproving Neural Language Modeling via Adversari...2019-06-10Code
5FRAGE + AWD-LSTM-MoS + dynamic eval40.85NoFRAGE: Frequency-Agnostic Word Representation2018-09-18Code
6Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.42NoImproved Language Modeling by Decoding the Past2018-08-14-
7GL-LWGC + AWD-MoS-LSTM + dynamic eval42.19NoGradual Learning of Recurrent Neural Networks2017-08-29Code
8AWD-LSTM-MoS + dynamic eval42.41NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
9AWD-LSTM-DRILL + dynamic eval43.9NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
10AWD-LSTM + dynamic eval46.4NoDynamic Evaluation of Neural Sequence Models2017-09-21Code
11AWD-LSTM + continuous cache pointer53.8NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
12AWD-LSTM-DOC x554.19NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
13AWD-FWM Schlag et al. (2020)54.48NoLearning Associative Inference Using Fast Weight...2020-11-16Code
14Ensemble of All55.4NoAdvancing State of the Art in Language Modeling2023-11-28Code
15Mogrifier LSTM57.3NoMogrifier LSTM2019-09-04Code
16AWD-LSTM-DOC + Partial Shuffle60.16NoPartially Shuffling the Training Data to Improve...2019-03-11Code
17AWD-LSTM-DOC60.29NoDirect Output Connection for a High-Rank Languag...2018-08-30Code
18AWD-LSTM-MoS + Partial Shuffle62.38NoPartially Shuffling the Training Data to Improve...2019-03-11Code
19AWD-LSTM-MoS63.88NoBreaking the Softmax Bottleneck: A High-Rank RNN...2017-11-10Code
20AWD-LSTM-DRILL64.9NoDeep Residual Output Layers for Neural Language ...2019-05-14Code
21AWD-LSTM 3-layer with Fraternal dropout66.8NoFraternal Dropout2017-10-31Code
22AWD-LSTM + ATOI67.47NoAlleviating Sequence Information Loss with Data ...2019-09-18Code
23AWD-LSTM68.6NoRegularizing and Optimizing LSTM Language Models2017-08-07Code
24Melis et al. (2017) - 1-layer LSTM (tied)69.3NoOn the State of the Art of Evaluation in Neural ...2017-07-18Code
25Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss91.5NoTying Word Vectors and Word Classifiers: A Loss ...2016-11-04Code
26Inan et al. (2016) - Variational LSTM (tied) (h=650)92.3NoTying Word Vectors and Word Classifiers: A Loss ...2016-11-04Code