Language Modelling on The Pile

Metric: Test perplexity (lower is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

#	Model↕	Test perplexity▲	Extra Data	Paper	Date↕	Code
1	Larger Transformer 771M (fine-tuned)	10	No	Need a Small Specialized Language Model? Plan Ea...	2024-02-02	-
2	Hybrid H3 125M	10.2	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
3	GPT-Neo 2.7B	10.44	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
4	Transformer 125M	10.7	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
5	GPT-Neo 1.3B	11.46	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
6	Smaller Transformer 126M (fine-tuned)	12	No	Need a Small Specialized Language Model? Plan Ea...	2024-02-02	-
7	OPT 2.7B	17.81	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
8	GPT-Neo 125M	17.83	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
9	OPT 1.3B	19.55	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
10	Larger Transformer 771M (pre-trained)	28.1	No	Need a Small Specialized Language Model? Plan Ea...	2024-02-02	-
11	OPT 125M	32.26	No	Knowledge Unlearning for Mitigating Privacy Risk...	2022-10-04	Code
12	Smaller Transformer 126M (pre-trained)	33	No	Need a Small Specialized Language Model? Plan Ea...	2024-02-02	-