TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Methodology/Multi-Task Learning/BBH-alg

Multi-Task Learning on BBH-alg

Metric: Average (%) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Average (%)▼AugmentationsPaperDate↕Code
1code-davinci-002 175B (CoT)73.9NoEvaluating Large Language Models Trained on Code2021-07-07Code
2Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)66.5NoScaling Instruction-Finetuned Language Models2022-10-20Code
3PaLM 540B (CoT + self-consistency)62.2NoScaling Instruction-Finetuned Language Models2022-10-20Code
4Flan-PaLM 540B (3-shot, fine-tuned, CoT)61.3NoScaling Instruction-Finetuned Language Models2022-10-20Code
5PaLM 540B (CoT)57.6NoScaling Instruction-Finetuned Language Models2022-10-20Code
6Flan-PaLM 540B (3-shot, fine-tuned)48.2NoScaling Instruction-Finetuned Language Models2022-10-20Code
7PaLM 540B38.3NoScaling Instruction-Finetuned Language Models2022-10-20Code