TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Gopher-280B (few-shot, k=5)

Gopher-280B (few-shot, k=5)

Reported on 74 benchmarks across 52 tasks · 1 paper · 73 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Miscellaneous45 results

  • EthicsonBIG-bench
    Accuracy· 2021-12-08
    70
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • General KnowledgeonBIG-bench
    Accuracy· 2021-12-08
    93.9
    best: 94.3 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School European HistoryonBIG-bench
    Accuracy· 2021-12-08
    72.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School US HistoryonBIG-bench
    Accuracy· 2021-12-08
    78.9
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School World HistoryonBIG-bench
    Accuracy· 2021-12-08
    75.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • International LawonBIG-bench
    Accuracy· 2021-12-08
    77.7
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • JurisprudenceonBIG-bench
    Accuracy · 2021-12-08
    71.3
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical FallaciesonBIG-bench
    Accuracy · 2021-12-08
    72.4
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • ManagementonBIG-bench
    Accuracy · 2021-12-08
    77.7
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • MarketingonBIG-bench
    Accuracy· 2021-12-08
    83.3
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • PhilosophyonBIG-bench
    Accuracy· 2021-12-08
    68.8
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • PrehistoryonBIG-bench
    Accuracy· 2021-12-08
    67.6
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Professional LawonBIG-bench
    Accuracy· 2021-12-08
    44.5
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • World ReligionsonBIG-bench
    Accuracy· 2021-12-08
    84.2
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • AnatomyonBIG-bench
    Accuracy · 2021-12-08
    56.3
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Clinical KnowledgeonBIG-bench
    Accuracy · 2021-12-08
    67.2
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • College MedicineonBIG-bench
    Accuracy · 2021-12-08
    60.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Human AgingonBIG-bench
    Accuracy · 2021-12-08
    66.4
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Human Organs Senses Multiple ChoiceonBIG-bench
    Accuracy · 2021-12-08
    84.8
    best: 85.7 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • NutritiononBIG-bench
    Accuracy · 2021-12-08
    69.9
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Professional MedicineonBIG-bench
    Accuracy· 2021-12-08
    64
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • VirologyonBIG-bench
    Accuracy· 2021-12-08
    47
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • EconometricsonBIG-bench
    Accuracy· 2021-12-08
    43
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School GeographyonBIG-bench
    Accuracy · 2021-12-08
    76.8
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School Government and PoliticsonBIG-bench
    Accuracy · 2021-12-08
    83.9
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School MacroeconomicsonBIG-bench
    Accuracy · 2021-12-08
    65.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School MicroeconomicsonBIG-bench
    Accuracy· 2021-12-08
    66.4
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • High School PsychologyonBIG-bench
    Accuracy · 2021-12-08
    81.8
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Human SexualityonBIG-bench
    Accuracy· 2021-12-08
    67.2
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Professional PsychologyonBIG-bench
    Accuracy · 2021-12-08
    68.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Public RelationsonBIG-bench
    Accuracy · 2021-12-08
    71.8
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Security StudiesonBIG-bench
    Accuracy · 2021-12-08
    64.9
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • SociologyonBIG-bench
    Accuracy · 2021-12-08
    84.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • US Foreign PolicyonBIG-bench
    Accuracy · 2021-12-08
    81
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Intent RecognitiononBIG-bench
    Accuracy · 2021-12-08
    88.7
    best: 92.8 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • AstronomyonBIG-bench
    Accuracy· 2021-12-08
    65.8
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Computer SecurityonBIG-bench
    Accuracy · 2021-12-08
    65
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • EthicsonBIG-bench
    Accuracy· 2021-12-08
    40.2
    best: 70
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • EthicsonBIG-bench
    Accuracy· 2021-12-08
    55.1
    best: 70
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • EthicsonBIG-bench
    Accuracy· 2021-12-08
    66.8
    best: 70
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Fact CheckingonBIG-bench
    Accuracy· 2021-12-08
    61.7
    best: 77.5 (Gopher-280B (few-shot, k=10))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Fact CheckingonBIG-bench
    Accuracy· 2021-12-08
    69.1
    best: 77.5 (Gopher-280B (few-shot, k=10))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • General KnowledgeonBIG-bench
    Accuracy· 2021-12-08
    75.7
    best: 94.3 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • General KnowledgeonBIG-bench
    Accuracy· 2021-12-08
    81.8
    best: 94.3 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • General KnowledgeonBIG-bench
    Accuracy· 2021-12-08
    38
    best: 94.3 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446

Natural Language Processing37 results

  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    88.7
    best: 94 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy· 2021-12-08
    71.6
    best: 78 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Question AnsweringonBIG-bench (Novel Concepts)
    Accuracy· 2021-12-08
    59.1
    best: 71.9 (PaLM-540B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Question AnsweringonBIG-bench (Movie Recommendation)
    Accuracy· 2021-12-08
    50.5
    best: 94.4 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Question AnsweringonBIG-bench (Navigate)
    Accuracy· 2021-12-08
    51.1
    best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Question AnsweringonBIG-bench (Ruin Names)
    Accuracy· 2021-12-08
    38.6
    best: 90 (PaLM 2 (few-shot, k=3, Direct))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Question AnsweringonBIG-bench (Hyperbaton)
    Accuracy· 2021-12-08
    51.7
    best: 92 (Bloomberg GPT (few-shot, k=3))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Causal Judgment)
    Accuracy· 2021-12-08
    50.8
    best: 62 (PaLM 2 (few-shot, k=3, Direct))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Disambiguation QA)
    Accuracy· 2021-12-08
    45.5
    best: 78.8 (PaLM 2 (few-shot, k=3, Direct))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Sports Understanding)
    Accuracy· 2021-12-08
    54.9
    best: 98 (PaLM 2(few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Winowhy)
    Accuracy· 2021-12-08
    56.7
    best: 65.9 (PaLM-540B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Known Unknowns)
    Accuracy· 2021-12-08
    63.6
    best: 73.9 (PaLM-540B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Date Understanding)
    Accuracy· 2021-12-08
    44.1
    best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench (Logical Sequence)
    Accuracy· 2021-12-08
    36.4
    best: 64.1 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy · 2021-12-08
    63.6
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    69.7
    best: 86.86 (Orca 2-13B)
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Word Sense DisambiguationonBIG-bench (Anachronisms)
    Accuracy· 2021-12-08
    56.4
    best: 69.1 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Sarcasm DetectiononBIG-bench (SNARKS)
    Accuracy· 2021-12-08
    48.3
    best: 84.8 (PaLM 2(few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Emotional IntelligenceonBIG-bench
    Accuracy· 2021-12-08
    83.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • MemorizationonBIG-bench (Hindu Knowledge)
    Accuracy· 2021-12-08
    80
    best: 95.4 (PaLM-540B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    36.4
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy· 2021-12-08
    41.4
    best: 78 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy· 2021-12-08
    62
    best: 78 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    57.6
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    64.1
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    52.7
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    27.3
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    50.7
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy· 2021-12-08
    61.4
    best: 78 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    81.8
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Reading ComprehensiononBIG-bench
    Accuracy · 2021-12-08
    75.1
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    68.2
    best: 86.86 (Orca 2-13B)
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    11.7
    best: 86.86 (Orca 2-13B)
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    52.5
    best: 86.86 (Orca 2-13B)
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    50.9
    best: 86.86 (Orca 2-13B)
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy· 2021-12-08
    56.8
    best: 86.86 (Orca 2-13B)
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Common Sense ReasoningonBIG-bench
    Accuracy · 2021-12-08
    39.6
    best: 63.6
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446

Methodology18 results

  • Logical ReasoningonBIG-bench (Penguins In A Table)
    Accuracy· 2021-12-08
    40.6
    best: 84.9 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (Logic Grid Puzzle)
    Accuracy· 2021-12-08
    35.1
    best: 44 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (Temporal Sequences)
    Accuracy· 2021-12-08
    19
    best: 100 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (Formal Fallacies Syllogisms Negation)
    Accuracy· 2021-12-08
    50.7
    best: 64.8 (PaLM 2 (few-shot, k=3, Direct))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (Reasoning About Colored Objects)
    Accuracy· 2021-12-08
    49.2
    best: 91.2 (PaLM 2 (few-shot, k=3, CoT))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (Logical Fallacy Detection)
    Accuracy· 2021-12-08
    58.9
    best: 72.1 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench (StrategyQA)
    Accuracy· 2021-12-08
    61
    best: 73.9 (PaLM-540B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    89.5
    best: 94 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy · 2021-12-08
    59.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • BIG-bench Machine LearningonBIG-bench
    Accuracy· 2021-12-08
    41.1
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    59.7
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    56.4
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    33.6
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    59.3
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    53
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    16.7
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    34
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Logical ReasoningonBIG-bench
    Accuracy· uses extra data· 2021-12-08
    37
    best: 94 (Chinchilla-70B (few-shot, k=5))
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446

Knowledge Base5 results

  • Mathematical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    35.7
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Mathematical ReasoningonBIG-bench
    Accuracy · 2021-12-08
    57.6
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Mathematical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    25
    best: 35.7
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Mathematical ReasoningonBIG-bench
    Accuracy· 2021-12-08
    23.7
    best: 35.7
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Mathematical ReasoningonBIG-bench
    Accuracy · 2021-12-08
    44.3
    best: 57.6
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446

Reasoning3 results

  • Analogical SimilarityonBIG-bench
    Accuracy· 2021-12-08
    17.2
    best: 38.1 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Identify Odd MetaporonBIG-bench
    Accuracy· 2021-12-08
    38.6
    best: 68.8 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446
  • Odd One OutonBIG-bench
    Accuracy· 2021-12-08
    32.5
    best: 70.9 (Chinchilla-70B (few-shot, k=5))
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446

Medical1 result

  • Medical GeneticsonBIG-bench
    Accuracy· 2021-12-08
    69
    SOTA
    Scaling Language Models: Methods, Analysis & Insights from Training GopherarXiv:2112.11446