Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Sentence Completion
/
HellaSwag
Sentence Completion on HellaSwag
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
CompassMTL 567M with Tailor
96.1
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
2
CompassMTL 567M
95.6
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
3
DeBERTa-Large 304M (classification-based)
95.6
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
4
GPT-4 (10-shot)
95.3
No
GPT-4 Technical Report
2023-03-15
Code
5
LLaMA3+MoSLoRA
95
No
Mixture-of-Subspaces in Low-Rank Adaptation
2024-06-16
Code
6
DeBERTa-Large 304M
94.7
No
Two is Better than Many? Binary Classification a...
2022-10-29
Code
7
LLaMA-2 13B + MixLoRA
94.7
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
8
Unicorn 11B (fine-tuned)
93.9
Yes
UNICORN on RAINBOW: A Universal Commonsense Reas...
2021-03-24
Code
9
LLaMA-3 8B + MixLoRA
93.3
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
10
LLaMA-2 7B + MixLoRA
93.1
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
11
DeBERTa++
93
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
12
ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)
91.5
No
DiscoSense: Commonsense Reasoning with Discourse...
2022-10-22
Code
13
DBRX Instruct 132B (10-shot)
89
No
-
-
-
14
TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)
88.3
No
-
-
-
15
ALBERT-XXL 235M
88
No
-
-
-
16
PaLM 2-L (1-shot)
87.4
No
PaLM 2 Technical Report
2023-05-17
Code
17
ELECTRA-Large 335M (fine-tuned on HellaSwag)
86.9
No
DiscoSense: Commonsense Reasoning with Discourse...
2022-10-22
Code
18
PaLM 2-M (1-shot)
86.7
No
PaLM 2 Technical Report
2023-05-17
Code
19
MUPPET Roberta Large
86.4
No
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
20
LLaMA 65B + CFG (0-shot)
86.3
No
Stay on topic with Classifier-Free Guidance
2023-06-30
-
21
Falcon-180B (0-shot)
85.9
No
The Falcon Series of Open Language Models
2023-11-28
-
22
PaLM 2-S (1-shot)
85.6
No
PaLM 2 Technical Report
2023-05-17
Code
23
GPT-3.5 (10-shot)
85.5
No
GPT-4 Technical Report
2023-03-15
Code
24
RoBERTa-Large Ensemble
85.5
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
25
LLaMA 30B + CFG (0-shot)
85.3
No
Stay on topic with Classifier-Free Guidance
2023-06-30
-
26
LLaMA 2 70B (0-shot)
85.3
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
27
HyKAS+CSKG
85
No
Towards Generalizable Neuro-Symbolic Systems for...
2019-10-30
-
28
LLaMA 65B (0-shot)
84.2
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
29
PaLM-540B (Few-Shot)
83.8
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
30
PaLM-540B (1-shot)
83.6
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
31
ExDeBERTa 567M
83.6
No
Task Compass: Scaling Multi-task Pre-training wi...
2022-10-12
Code
32
PaLM-540B (0-shot)
83.4
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
33
LLaMA 2 34B (0-shot)
83.3
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
34
Camelidae-8×34B (10-shot)
83.2
No
Parameter-Efficient Sparsity Crafting from Dense...
2024-01-05
Code
35
LLaMA 33B (0-shot)
82.8
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
36
Falcon-40B (0-shot)
82.7
No
The Falcon Series of Open Language Models
2023-11-28
-
37
Megatron-Turing NLG 530B (Few-Shot)
82.4
No
Using DeepSpeed and Megatron to Train Megatron-T...
2022-01-28
Code
38
Qwen2idae-16x14B (10-shot)
82.3
No
Parameter-Efficient Sparsity Crafting from Dense...
2024-01-05
Code
39
LLaMA 13B + CFG (0-shot)
82.1
No
Stay on topic with Classifier-Free Guidance
2023-06-30
-
40
RoBERTa-Large 355M
81.7
No
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
41
Mistral 7B (0-shot)
81.3
No
Mistral 7B
2023-10-10
Code
42
Chinchilla 70B (0-shot)
80.8
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
43
LLaMA 2 13B (0-shot)
80.7
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
44
Megatron-Turing NLG 530B (1-shot)
80.2
No
Using DeepSpeed and Megatron to Train Megatron-T...
2022-01-28
Code
45
GPT-3 175B (few-shot, k=32)
79.3
No
Language Models are Few-Shot Learners
2020-05-28
Code
46
Gopher 280B (0-shot)
79.2
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
47
LLaMA 13B (0-shot)
79.2
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
48
GPT-3 (0-shot)
78.9
No
Language Models are Few-Shot Learners
2020-05-28
Code
49
LLaMA 2 7B (0-shot)
77.2
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
50
Falcon-7B (0-shot)
76.3
No
The Falcon Series of Open Language Models
2023-11-28
-
51
LLaMA 7B (0-shot)
76.1
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
52
BlooombergGPT 50B (1-shot)
73.9
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
53
OPT 66B (1-shot)
73.5
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
54
BLOOM 176B (1-shot)
73.2
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
55
Sheared-LLaMA-2.7B (50B)
70.8
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
56
GPT-NeoX 20B (1-shot)
68.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
57
Open-LLaMA-3B-v2
67.6
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
58
Mamba-2.8B
66.1
No
Mamba: Linear-Time Sequence Modeling with Select...
2023-12-01
Code
59
Sheared-LLaMA-1.3B (50B)
60.7
No
Sheared LLaMA: Accelerating Language Model Pre-t...
2023-10-10
Code
60
FLAN 137B (3-shot)
59.2
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
61
Mamba-1.4B
59.1
No
Mamba: Linear-Time Sequence Modeling with Select...
2023-12-01
Code
62
FLAN 137B (0-shot)
56.7
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
63
sMLP – deterministic 9.4B (0-shot)
54.5
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
64
Switch Transformer 9B
52.5
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
65
GPT-3 Large 760M (0-shot)
51
No
Language Models are Few-Shot Learners
2020-05-28
Code
66
GPT-2-XL 1.5B
50.9
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
67
OPT-6.7B
50.3
No
LLM in a flash: Efficient Large Language Model I...
2023-12-12
-
68
LLM in a Flash (OPT-6.7B with Predictor)
49.8
No
LLM in a flash: Efficient Large Language Model I...
2023-12-12
-
69
FLAN-T5-Large 783M
48.7
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
70
LaMini-GPT 1.5B
48.3
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
71
BERT-Large 340M
47.3
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
72
LaMini-F-T5 783M
43.7
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
73
GPT-1 117M
41.7
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
74
Flipped-3B
41.6
No
Guess the Instruction! Flipped Learning Makes La...
2022-10-06
Code
75
T0-3B (CoT fine-tuned)
41.1
No
The CoT Collection: Improving Zero-shot and Few-...
2023-05-23
Code
76
LaMini-T5 738M
40.6
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
77
BERT-Base 110M
40.5
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
78
T5-Large 738M
38.9
No
LaMini-LM: A Diverse Herd of Distilled Models fr...
2023-04-27
Code
79
Gshard 9B
38
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
80
LSTM + BERT-Base
36.2
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
81
RoE-3B
34.6
No
Exploring the Benefits of Training Expert Langua...
2023-02-07
Code
82
ESIM + ElMo
33.3
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
83
HASH Layers 10B (0-shot)
33
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
84
LSTM + GloVe
31.7
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
85
fastText
31.6
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
86
LSTM + ElMo
31.4
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
87
Base Layers 10B (0-shot)
30.2
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
88
KiC-770M
29.6
No
Knowledge-in-Context: Towards Knowledgeable Semi...
2022-10-28
-
89
Random chance baseline
25
No
HellaSwag: Can a Machine Really Finish Your Sent...
2019-05-19
Code
#1
CompassMTL 567M with Tailor
SOTA
96.1
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#2
CompassMTL 567M
95.6
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#3
DeBERTa-Large 304M (classification-based)
95.6
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#4
GPT-4 (10-shot)
95.3
Accuracy
· 2023-03-15
GPT-4 Technical Report
Code
#5
LLaMA3+MoSLoRA
95
Accuracy
· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation
Code
#6
DeBERTa-Large 304M
94.7
Accuracy
· 2022-10-29
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Code
#7
LLaMA-2 13B + MixLoRA
94.7
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#8
Unicorn 11B (fine-tuned)
SOTA
93.9
Accuracy
· Extra Data
· 2021-03-24
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Code
#9
LLaMA-3 8B + MixLoRA
93.3
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#10
LLaMA-2 7B + MixLoRA
93.1
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#11
DeBERTa++
SOTA
93
Accuracy
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#12
ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)
91.5
Accuracy
· 2022-10-22
DiscoSense: Commonsense Reasoning with Discourse Connectives
Code
#13
DBRX Instruct 132B (10-shot)
89
Accuracy
No paper
#14
TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)
88.3
Accuracy
No paper
#15
ALBERT-XXL 235M
88
Accuracy
No paper
#16
PaLM 2-L (1-shot)
87.4
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#17
ELECTRA-Large 335M (fine-tuned on HellaSwag)
86.9
Accuracy
· 2022-10-22
DiscoSense: Commonsense Reasoning with Discourse Connectives
Code
#18
PaLM 2-M (1-shot)
86.7
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#19
MUPPET Roberta Large
86.4
Accuracy
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#20
LLaMA 65B + CFG (0-shot)
86.3
Accuracy
· 2023-06-30
Stay on topic with Classifier-Free Guidance
#21
Falcon-180B (0-shot)
85.9
Accuracy
· 2023-11-28
The Falcon Series of Open Language Models
#22
PaLM 2-S (1-shot)
85.6
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#23
GPT-3.5 (10-shot)
85.5
Accuracy
· 2023-03-15
GPT-4 Technical Report
Code
#24
RoBERTa-Large Ensemble
SOTA
85.5
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#25
LLaMA 30B + CFG (0-shot)
85.3
Accuracy
· 2023-06-30
Stay on topic with Classifier-Free Guidance
#26
LLaMA 2 70B (0-shot)
85.3
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#27
HyKAS+CSKG
85
Accuracy
· 2019-10-30
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering
#28
LLaMA 65B (0-shot)
84.2
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#29
PaLM-540B (Few-Shot)
83.8
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#30
PaLM-540B (1-shot)
83.6
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#31
ExDeBERTa 567M
83.6
Accuracy
· 2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix
Code
#32
PaLM-540B (0-shot)
83.4
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#33
LLaMA 2 34B (0-shot)
83.3
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#34
Camelidae-8×34B (10-shot)
83.2
Accuracy
· 2024-01-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Code
#35
LLaMA 33B (0-shot)
82.8
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#36
Falcon-40B (0-shot)
82.7
Accuracy
· 2023-11-28
The Falcon Series of Open Language Models
#37
Megatron-Turing NLG 530B (Few-Shot)
82.4
Accuracy
· 2022-01-28
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Code
#38
Qwen2idae-16x14B (10-shot)
82.3
Accuracy
· 2024-01-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Code
#39
LLaMA 13B + CFG (0-shot)
82.1
Accuracy
· 2023-06-30
Stay on topic with Classifier-Free Guidance
#40
RoBERTa-Large 355M
81.7
Accuracy
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#41
Mistral 7B (0-shot)
81.3
Accuracy
· 2023-10-10
Mistral 7B
Code
#42
Chinchilla 70B (0-shot)
80.8
Accuracy
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#43
LLaMA 2 13B (0-shot)
80.7
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#44
Megatron-Turing NLG 530B (1-shot)
80.2
Accuracy
· 2022-01-28
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Code
#45
GPT-3 175B (few-shot, k=32)
79.3
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#46
Gopher 280B (0-shot)
79.2
Accuracy
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#47
LLaMA 13B (0-shot)
79.2
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#48
GPT-3 (0-shot)
78.9
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#49
LLaMA 2 7B (0-shot)
77.2
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#50
Falcon-7B (0-shot)
76.3
Accuracy
· 2023-11-28
The Falcon Series of Open Language Models
#51
LLaMA 7B (0-shot)
76.1
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#52
BlooombergGPT 50B (1-shot)
73.9
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#53
OPT 66B (1-shot)
73.5
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#54
BLOOM 176B (1-shot)
73.2
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#55
Sheared-LLaMA-2.7B (50B)
70.8
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#56
GPT-NeoX 20B (1-shot)
68.4
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#57
Open-LLaMA-3B-v2
67.6
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#58
Mamba-2.8B
66.1
Accuracy
· 2023-12-01
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Code
#59
Sheared-LLaMA-1.3B (50B)
60.7
Accuracy
· 2023-10-10
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code
#60
FLAN 137B (3-shot)
59.2
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#61
Mamba-1.4B
59.1
Accuracy
· 2023-12-01
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Code
#62
FLAN 137B (0-shot)
56.7
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#63
sMLP – deterministic 9.4B (0-shot)
54.5
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#64
Switch Transformer 9B
52.5
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#65
GPT-3 Large 760M (0-shot)
51
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#66
GPT-2-XL 1.5B
50.9
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#67
OPT-6.7B
50.3
Accuracy
· 2023-12-12
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
#68
LLM in a Flash (OPT-6.7B with Predictor)
49.8
Accuracy
· 2023-12-12
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
#69
FLAN-T5-Large 783M
48.7
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#70
LaMini-GPT 1.5B
48.3
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#71
BERT-Large 340M
SOTA
47.3
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#72
LaMini-F-T5 783M
43.7
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#73
GPT-1 117M
41.7
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#74
Flipped-3B
41.6
Accuracy
· 2022-10-06
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Code
#75
T0-3B (CoT fine-tuned)
41.1
Accuracy
· 2023-05-23
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Code
#76
LaMini-T5 738M
40.6
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#77
BERT-Base 110M
40.5
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#78
T5-Large 738M
38.9
Accuracy
· 2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Code
#79
Gshard 9B
38
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#80
LSTM + BERT-Base
36.2
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#81
RoE-3B
34.6
Accuracy
· 2023-02-07
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Code
#82
ESIM + ElMo
33.3
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#83
HASH Layers 10B (0-shot)
33
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#84
LSTM + GloVe
31.7
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#85
fastText
31.6
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#86
LSTM + ElMo
31.4
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code
#87
Base Layers 10B (0-shot)
30.2
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#88
KiC-770M
29.6
Accuracy
· 2022-10-28
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
#89
Random chance baseline
25
Accuracy
· 2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?
Code