TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Methodology/Multi-Task Learning/MML

Multi-Task Learning on MML

Metric: Average (%) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Average (%)▼AugmentationsPaperDate↕Code
1GPT-4 o1(300b)87YesGPT-4o as the Gold Standard: A Scalable and Gene...2024-10-03-
2Llama 3.1 (405B)86.6YesLlama 3 Meets MoE: Efficient Upcycling2024-12-13Code
3Llama 3.1 (70B)86YesLlama 3 Meets MoE: Efficient Upcycling2024-12-13Code
4Gemini Ultra (5-shot)83.7No---
5Claude 3 Sonnet (5-shot)79No---
6Qwen1.5 72B (5-shot)77.5No---
7Claude 3 Haiku (5-shot)75.2No---
8DBRX Instruct 132B (5-shot)73.7NoThe Llama 3 Herd of Models2024-07-31Code
9llama 2(65b)73.5NoScaling Instruction-Finetuned Language Models2022-10-20Code
10Llama 3.1 8B (CoT)73YesThe Llama 3 Herd of Models2024-07-31Code
11Mixtral 8x7B (5-shot)70.6NoMixtral of Experts2024-01-08Code
12GPT-3.5 Turbo70YesGPT-4 Technical Report2023-03-15Code
13LLaMA 65B (fine-tuned)68.9NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
14chatgpt/gpt3.5(20B)67.5NoTraining Compute-Optimal Large Language Models2022-03-29Code
15LLaMA 65B (5-shot)63.4NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
16LLaMA 2 34B (5-shot)62.6NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
17Mistral 7B (5-shot)62.5YesMixtral of Experts2024-01-08Code
18Mistral 7B (5-shot)60.1NoMistral 7B2023-10-10Code
19GPT-3 Davinci 175B (CoT)59.5NoScaling Instruction-Finetuned Language Models2022-10-20Code
20LLaMA 33B (5-shot)57.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
21Falcon 40B57NoThe Falcon Series of Open Language Models2023-11-28-
22Qwen 7B (5-shot)56.7No---
23LLaMA 2 13B (5-shot)54.8NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
24Branch-Train-MiX 4x7B (sampling top-1 experts)53.2NoBranch-Train-MiX: Mixing Expert LLMs into a Mixt...2024-03-12Code
25GAL 120B (zero-shot)52.6NoGalactica: A Large Language Model for Science2022-11-16Code
26Atlas (5-shot)47.9NoAtlas: Few-shot Learning with Retrieval Augmente...2022-08-05Code
27Flan-T5-XL 3B (CoT)45.5NoScaling Instruction-Finetuned Language Models2022-10-20Code
28LLaMA 2 7B (5-shot)45.3NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
29Flan-T5-Large 780M45.1NoScaling Instruction-Finetuned Language Models2022-10-20Code
30GLM-130B44.8NoGLM-130B: An Open Bilingual Pre-trained Model2022-10-05Code
31Flan-T5-Large 780M (CoT)40.5NoScaling Instruction-Finetuned Language Models2022-10-20Code
32GPT-3 Davinci 175B (5-shot)39.7NoScaling Instruction-Finetuned Language Models2022-10-20Code
33Bloomberg GPT 50B (5-shot)39.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
34UL2 20B (5-shot)39.2NoUL2: Unifying Language Learning Paradigms2022-05-10Code
35BLOOM 176B (5-shot)39.1NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
36phi-1.5-web 1.3B37.9NoTextbooks Are All You Need II: phi-1.5 technical...2023-09-11Code
37OPT 66B (5-shot)36NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
38Flan-T5-Base 250M35.9NoScaling Instruction-Finetuned Language Models2022-10-20Code
39Flan-T5-Base 250M (CoT)33.7NoScaling Instruction-Finetuned Language Models2022-10-20Code
40GPT-NeoX 20B (5-shot)33.6NoGPT-NeoX-20B: An Open-Source Autoregressive Lang...2022-04-14Code
41RWKV v5 Eagle 7B31No---
42LLaMA7B-MiLe-Loss(5-shot)29.68NoMiLe Loss: a New Loss for Mitigating the Bias of...2023-10-30Code
43Flan-T5-Small 80M28.7NoScaling Instruction-Finetuned Language Models2022-10-20Code
44Falcon 7B (5-shot)28NoThe Falcon Series of Open Language Models2023-11-28-