Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
The Pile
Language Modelling on The Pile
Metric: Bits per byte (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Bits per byte
▼
Extra Data
Paper
Date
↕
Code
1
GPT-2 Small 124M (pre-trained)
1.2253
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
2
GPT-2 Medium 355M (pre-trained)
1.0928
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
3
GPT-2 Large 774M (pre-trained)
1.0828
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
4
GPT-2 XL 1.5B (pre-trained)
1.0468
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
5
GPT-3 Ada 350M (pre-trained)
0.9631
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
6
GPT-3 Babbage 1.3B (pre-trained)
0.8718
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
7
Test-Time Fine-Tuning with SIFT + GPT-2 (124M)
0.862
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
8
GPT-2 Large 774M (test-time training on nearest neighbors)
0.85
No
Test-Time Training on Nearest Neighbors for Larg...
2023-05-29
Code
9
Llama-3.2-Instruct 1B
0.807
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
10
GPT-3 Curie 6.7B (pre-trained)
0.798
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
11
Test-Time Fine-Tuning with SIFT + GPT-2 (774M)
0.762
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
12
GPT-3
0.742
No
GLM-130B: An Open Bilingual Pre-trained Model
2022-10-05
Code
13
Llama-3.2-Instruct 3B
0.737
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
14
Gemma-2 2B
0.721
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
15
GPT-3 Davinci 175B (pre-trained)
0.7177
No
The Pile: An 800GB Dataset of Diverse Text for L...
2020-12-31
Code
16
Llama-3.2 1B
0.697
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
17
Phi-3 3.8B
0.679
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
18
Phi-3 7B
0.678
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
19
Gemma-2 9B
0.67
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
20
Phi-3 14B
0.651
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
21
Jurassic-1
0.65
No
GLM-130B: An Open Bilingual Pre-trained Model
2022-10-05
Code
22
Llama-3.2 3B
0.64
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
23
GLM-130B
0.634
No
GLM-130B: An Open Bilingual Pre-trained Model
2022-10-05
Code
24
Gemma-2 27B
0.629
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
25
Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)
0.606
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
26
Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)
0.595
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code
27
Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
0.557
No
Efficiently Learning at Test-Time: Active Fine-T...
2024-10-10
Code