TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/The Pile

Language Modelling on The Pile

Metric: Bits per byte (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Bits per byte▼Extra DataPaperDate↕Code
1GPT-2 Small 124M (pre-trained)1.2253NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
2GPT-2 Medium 355M (pre-trained)1.0928NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
3GPT-2 Large 774M (pre-trained)1.0828NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
4GPT-2 XL 1.5B (pre-trained)1.0468NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
5GPT-3 Ada 350M (pre-trained)0.9631NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
6GPT-3 Babbage 1.3B (pre-trained)0.8718NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
7Test-Time Fine-Tuning with SIFT + GPT-2 (124M)0.862NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
8GPT-2 Large 774M (test-time training on nearest neighbors)0.85NoTest-Time Training on Nearest Neighbors for Larg...2023-05-29Code
9Llama-3.2-Instruct 1B0.807NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
10GPT-3 Curie 6.7B (pre-trained)0.798NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
11Test-Time Fine-Tuning with SIFT + GPT-2 (774M)0.762NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
12GPT-30.742NoGLM-130B: An Open Bilingual Pre-trained Model2022-10-05Code
13Llama-3.2-Instruct 3B0.737NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
14Gemma-2 2B0.721NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
15GPT-3 Davinci 175B (pre-trained)0.7177NoThe Pile: An 800GB Dataset of Diverse Text for L...2020-12-31Code
16Llama-3.2 1B0.697NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
17Phi-3 3.8B0.679NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
18Phi-3 7B0.678NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
19Gemma-2 9B0.67NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
20Phi-3 14B0.651NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
21Jurassic-10.65NoGLM-130B: An Open Bilingual Pre-trained Model2022-10-05Code
22Llama-3.2 3B0.64NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
23GLM-130B0.634NoGLM-130B: An Open Bilingual Pre-trained Model2022-10-05Code
24Gemma-2 27B0.629NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
25Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)0.606NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
26Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)0.595NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code
27Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)0.557NoEfficiently Learning at Test-Time: Active Fine-T...2024-10-10Code