TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/BIG-bench

BIG-bench

Beyond the Imitation Game Benchmark

TextsApache License 2.0Introduced 2022-06-09

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

Image source: https://arxiv.org/pdf/2206.04615.pdf

Benchmarks

Analogical Similarity/AccuracyAnatomy/Accuracy Astronomy/AccuracyBIG-bench Machine Learning/AccuracyClinical Knowledge/Accuracy College Medicine/Accuracy Common Sense Reasoning/AccuracyCommon Sense Reasoning/Accuracy Computer Security/Accuracy Econometrics/AccuracyEmotional Intelligence/AccuracyEthics/AccuracyFact Checking/AccuracyGeneral Knowledge/AccuracyHigh School European History/AccuracyHigh School Geography/Accuracy High School Government and Politics/Accuracy High School Macroeconomics/Accuracy High School Microeconomics/AccuracyHigh School Psychology/Accuracy High School US History/AccuracyHigh School World History/AccuracyHuman Aging/Accuracy Human Organs Senses Multiple Choice/Accuracy Human Sexuality/AccuracyIdentify Odd Metapor/AccuracyIntent Recognition/Accuracy International Law/AccuracyJurisprudence/Accuracy Logical Fallacies/Accuracy Logical Reasoning/AccuracyLogical Reasoning/Accuracy Management/Accuracy Marketing/AccuracyMathematical Reasoning/AccuracyMathematical Reasoning/Accuracy Medical Genetics/AccuracyNutrition/Accuracy Odd One Out/AccuracyPhilosophy/AccuracyPrehistory/AccuracyProfessional Law/AccuracyProfessional Medicine/AccuracyProfessional Psychology/Accuracy Public Relations/Accuracy Reading Comprehension/Accuracy Reading Comprehension/AccuracySecurity Studies/Accuracy Sociology/Accuracy US Foreign Policy/Accuracy Virology/AccuracyWorld Religions/Accuracy

Related Benchmarks

BIG-bench (Anachronisms)/Word Sense Disambiguation/AccuracyBIG-bench (Causal Judgment)/Common Sense Reasoning/AccuracyBIG-bench (Date Understanding)/Common Sense Reasoning/AccuracyBIG-bench (Disambiguation QA)/Common Sense Reasoning/AccuracyBIG-bench (Formal Fallacies Syllogisms Negation)/Logical Reasoning/AccuracyBIG-bench (Hindu Knowledge)/Memorization/AccuracyBIG-bench (Hyperbaton)/Question Answering/AccuracyBIG-bench (Known Unknowns)/Common Sense Reasoning/AccuracyBIG-bench (Logic Grid Puzzle)/Logical Reasoning/AccuracyBIG-bench (Logical Fallacy Detection)/Logical Reasoning/AccuracyBIG-bench (Logical Sequence)/Common Sense Reasoning/AccuracyBIG-bench (Movie Recommendation)/Question Answering/AccuracyBIG-bench (Navigate)/Question Answering/AccuracyBIG-bench (Novel Concepts)/Question Answering/AccuracyBIG-bench (Penguins In A Table)/Logical Reasoning/AccuracyBIG-bench (Reasoning About Colored Objects)/Logical Reasoning/AccuracyBIG-bench (Ruin Names)/Question Answering/AccuracyBIG-bench (SNARKS)/Sarcasm Detection/AccuracyBIG-bench (Sports Understanding)/Common Sense Reasoning/AccuracyBIG-bench (StrategyQA)/Logical Reasoning/AccuracyBIG-bench (Temporal Sequences)/Logical Reasoning/AccuracyBIG-bench (Winowhy)/Common Sense Reasoning/AccuracyBIG-bench-lite/Language Modelling/AccuracyBig-bench Lite/Auto Debugging/Exact string match

Statistics

Papers
349
Benchmarks
52

Links

Homepage

Tasks

Abstract AlgebraAnalogical SimilarityAnalytic EntailmentAnatomyAstronomyAuto DebuggingBIG-bench Machine LearningBusiness EthicsClinical KnowledgeCollege BiologyCollege ChemistryCollege Computer ScienceCollege MathematicsCollege MedicineCollege PhysicsCommon Sense ReasoningComputer SecurityConceptual PhysicsCrash BlossomCrass AIDark Humor DetectionDiscourse Marker PredictionEconometricsElectrical EngineeringElementary MathematicsEmotional IntelligenceEmpirical JudgmentsEnglish ProverbsEntailed PolarityEpistemic ReasoningEthicsEvaluating Information EssentialityFEVER (2-way)FEVER (3-way)Fact CheckingFantasy ReasoningFigure Of Speech DetectionFormal LogicGRE Reading ComprehensionGeneral KnowledgeGlobal FactsHigh School BiologyHigh School ChemistryHigh School Computer ScienceHigh School European HistoryHigh School GeographyHigh School Government and PoliticsHigh School MacroeconomicsHigh School MathematicsHigh School MicroeconomicsHigh School PhysicsHigh School PsychologyHigh School StatisticsHigh School US HistoryHigh School World HistoryHuman AgingHuman Organs Senses Multiple ChoiceHuman SexualityIdentify Odd MetaporImplicaturesImplicit RelationsIntent RecognitionInternational LawIrony IdentificationJurisprudenceLAMBADALanguage ModellingLogical ArgsLogical FallaciesLogical ReasoningManagementMarketingMathematical InductionMathematical ReasoningMedical GeneticsMemorizationMetaphor BooleanMiscellaneousMisconceptionsMoral DisputesMoral PermissibilityMoral ScenariosMovie Dialog Same Or DifferentMulti-task Language UnderstandingMultiple Choice Question Answering (MCQA)Natural QuestionsNonsense Words GrammarNutritionOdd One OutPhilosophyPhrase RelatednessPhysical IntuitionPhysics MCPrehistoryPresuppositions As NLIProfessional AccountingProfessional LawProfessional MedicineProfessional PsychologyPublic RelationsQuestion SelectionRACE-hRACE-mReading ComprehensionRiddle SenseSarcasm DetectionSecurity StudiesSentence AmbiguitySimilarities AbstractionSociologyTimedialTriviaQAUS Foreign PolicyUnderstanding FablesVirologyWord Sense DisambiguationWorld Religions