Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Gopher-280B (few-shot, k=5)

Gopher-280B (few-shot, k=5)

Reported on 74 benchmarks across 52 tasks · 1 paper · 73 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Miscellaneous45 results

EthicsonBIG-bench
Accuracy· 2021-12-08
70
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
General KnowledgeonBIG-bench
Accuracy· 2021-12-08
93.9
best: 94.3 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School European HistoryonBIG-bench
Accuracy· 2021-12-08
72.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School US HistoryonBIG-bench
Accuracy· 2021-12-08
78.9
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School World HistoryonBIG-bench
Accuracy· 2021-12-08
75.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
International LawonBIG-bench
Accuracy· 2021-12-08
77.7
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
JurisprudenceonBIG-bench
Accuracy · 2021-12-08
71.3
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical FallaciesonBIG-bench
Accuracy · 2021-12-08
72.4
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
ManagementonBIG-bench
Accuracy · 2021-12-08
77.7
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
MarketingonBIG-bench
Accuracy· 2021-12-08
83.3
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
PhilosophyonBIG-bench
Accuracy· 2021-12-08
68.8
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
PrehistoryonBIG-bench
Accuracy· 2021-12-08
67.6
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Professional LawonBIG-bench
Accuracy· 2021-12-08
44.5
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
World ReligionsonBIG-bench
Accuracy· 2021-12-08
84.2
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
AnatomyonBIG-bench
Accuracy · 2021-12-08
56.3
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Clinical KnowledgeonBIG-bench
Accuracy · 2021-12-08
67.2
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
College MedicineonBIG-bench
Accuracy · 2021-12-08
60.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Human AgingonBIG-bench
Accuracy · 2021-12-08
66.4
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Human Organs Senses Multiple ChoiceonBIG-bench
Accuracy · 2021-12-08
84.8
best: 85.7 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
NutritiononBIG-bench
Accuracy · 2021-12-08
69.9
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Professional MedicineonBIG-bench
Accuracy· 2021-12-08
64
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
VirologyonBIG-bench
Accuracy· 2021-12-08
47
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
EconometricsonBIG-bench
Accuracy· 2021-12-08
43
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School GeographyonBIG-bench
Accuracy · 2021-12-08
76.8
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School Government and PoliticsonBIG-bench
Accuracy · 2021-12-08
83.9
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School MacroeconomicsonBIG-bench
Accuracy · 2021-12-08
65.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School MicroeconomicsonBIG-bench
Accuracy· 2021-12-08
66.4
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
High School PsychologyonBIG-bench
Accuracy · 2021-12-08
81.8
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Human SexualityonBIG-bench
Accuracy· 2021-12-08
67.2
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Professional PsychologyonBIG-bench
Accuracy · 2021-12-08
68.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Public RelationsonBIG-bench
Accuracy · 2021-12-08
71.8
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Security StudiesonBIG-bench
Accuracy · 2021-12-08
64.9
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
SociologyonBIG-bench
Accuracy · 2021-12-08
84.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
US Foreign PolicyonBIG-bench
Accuracy · 2021-12-08
81
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Intent RecognitiononBIG-bench
Accuracy · 2021-12-08
88.7
best: 92.8 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
AstronomyonBIG-bench
Accuracy· 2021-12-08
65.8
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Computer SecurityonBIG-bench
Accuracy · 2021-12-08
65
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
EthicsonBIG-bench
Accuracy· 2021-12-08
40.2
best: 70
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
EthicsonBIG-bench
Accuracy· 2021-12-08
55.1
best: 70
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
EthicsonBIG-bench
Accuracy· 2021-12-08
66.8
best: 70
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Fact CheckingonBIG-bench
Accuracy· 2021-12-08
61.7
best: 77.5 (Gopher-280B (few-shot, k=10))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Fact CheckingonBIG-bench
Accuracy· 2021-12-08
69.1
best: 77.5 (Gopher-280B (few-shot, k=10))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
General KnowledgeonBIG-bench
Accuracy· 2021-12-08
75.7
best: 94.3 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
General KnowledgeonBIG-bench
Accuracy· 2021-12-08
81.8
best: 94.3 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
General KnowledgeonBIG-bench
Accuracy· 2021-12-08
38
best: 94.3 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446

Natural Language Processing37 results

Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
88.7
best: 94 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy· 2021-12-08
71.6
best: 78 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Question AnsweringonBIG-bench (Novel Concepts)
Accuracy· 2021-12-08
59.1
best: 71.9 (PaLM-540B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Question AnsweringonBIG-bench (Movie Recommendation)
Accuracy· 2021-12-08
50.5
best: 94.4 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Question AnsweringonBIG-bench (Navigate)
Accuracy· 2021-12-08
51.1
best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Question AnsweringonBIG-bench (Ruin Names)
Accuracy· 2021-12-08
38.6
best: 90 (PaLM 2 (few-shot, k=3, Direct))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Question AnsweringonBIG-bench (Hyperbaton)
Accuracy· 2021-12-08
51.7
best: 92 (Bloomberg GPT (few-shot, k=3))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Causal Judgment)
Accuracy· 2021-12-08
50.8
best: 62 (PaLM 2 (few-shot, k=3, Direct))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Disambiguation QA)
Accuracy· 2021-12-08
45.5
best: 78.8 (PaLM 2 (few-shot, k=3, Direct))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Sports Understanding)
Accuracy· 2021-12-08
54.9
best: 98 (PaLM 2(few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Winowhy)
Accuracy· 2021-12-08
56.7
best: 65.9 (PaLM-540B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Known Unknowns)
Accuracy· 2021-12-08
63.6
best: 73.9 (PaLM-540B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Date Understanding)
Accuracy· 2021-12-08
44.1
best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench (Logical Sequence)
Accuracy· 2021-12-08
36.4
best: 64.1 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy · 2021-12-08
63.6
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
69.7
best: 86.86 (Orca 2-13B)
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Word Sense DisambiguationonBIG-bench (Anachronisms)
Accuracy· 2021-12-08
56.4
best: 69.1 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Sarcasm DetectiononBIG-bench (SNARKS)
Accuracy· 2021-12-08
48.3
best: 84.8 (PaLM 2(few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Emotional IntelligenceonBIG-bench
Accuracy· 2021-12-08
83.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
MemorizationonBIG-bench (Hindu Knowledge)
Accuracy· 2021-12-08
80
best: 95.4 (PaLM-540B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
36.4
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy· 2021-12-08
41.4
best: 78 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy· 2021-12-08
62
best: 78 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
57.6
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
64.1
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
52.7
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
27.3
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
50.7
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy· 2021-12-08
61.4
best: 78 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
81.8
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Reading ComprehensiononBIG-bench
Accuracy · 2021-12-08
75.1
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
68.2
best: 86.86 (Orca 2-13B)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
11.7
best: 86.86 (Orca 2-13B)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
52.5
best: 86.86 (Orca 2-13B)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
50.9
best: 86.86 (Orca 2-13B)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy· 2021-12-08
56.8
best: 86.86 (Orca 2-13B)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Common Sense ReasoningonBIG-bench
Accuracy · 2021-12-08
39.6
best: 63.6
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446

Methodology18 results

Logical ReasoningonBIG-bench (Penguins In A Table)
Accuracy· 2021-12-08
40.6
best: 84.9 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (Logic Grid Puzzle)
Accuracy· 2021-12-08
35.1
best: 44 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (Temporal Sequences)
Accuracy· 2021-12-08
19
best: 100 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (Formal Fallacies Syllogisms Negation)
Accuracy· 2021-12-08
50.7
best: 64.8 (PaLM 2 (few-shot, k=3, Direct))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (Reasoning About Colored Objects)
Accuracy· 2021-12-08
49.2
best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (Logical Fallacy Detection)
Accuracy· 2021-12-08
58.9
best: 72.1 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench (StrategyQA)
Accuracy· 2021-12-08
61
best: 73.9 (PaLM-540B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
89.5
best: 94 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy · 2021-12-08
59.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
BIG-bench Machine LearningonBIG-bench
Accuracy· 2021-12-08
41.1
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
59.7
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
56.4
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
33.6
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
59.3
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
53
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
16.7
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· 2021-12-08
34
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Logical ReasoningonBIG-bench
Accuracy· uses extra data· 2021-12-08
37
best: 94 (Chinchilla-70B (few-shot, k=5))
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446

Knowledge Base5 results

Mathematical ReasoningonBIG-bench
Accuracy· 2021-12-08
35.7
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Mathematical ReasoningonBIG-bench
Accuracy · 2021-12-08
57.6
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Mathematical ReasoningonBIG-bench
Accuracy· 2021-12-08
25
best: 35.7
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Mathematical ReasoningonBIG-bench
Accuracy· 2021-12-08
23.7
best: 35.7
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Mathematical ReasoningonBIG-bench
Accuracy · 2021-12-08
44.3
best: 57.6
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446

Reasoning3 results

Analogical SimilarityonBIG-bench
Accuracy· 2021-12-08
17.2
best: 38.1 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Identify Odd MetaporonBIG-bench
Accuracy· 2021-12-08
38.6
best: 68.8 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446
Odd One OutonBIG-bench
Accuracy· 2021-12-08
32.5
best: 70.9 (Chinchilla-70B (few-shot, k=5))
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446

Medical1 result

Medical GeneticsonBIG-bench
Accuracy· 2021-12-08
69
SOTA
Scaling Language Models: Methods, Analysis & Insights from Training Gopher arXiv:2112.11446