Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Methodology
/
Transfer Learning
/
BBH-nlp
Transfer Learning on BBH-nlp
Metric: Average (%) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Average (%) (best first)
Average (%) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Average (%)
▼
Augmentations
Paper
Date
↕
Code
1
Qwen2.5-72B
86.3
No
-
-
-
2
Jiutian-大模型
86.1
No
-
-
-
3
LLama-3-405B
85.9
No
-
-
-
4
Jiutian-57B
84.07
No
-
-
-
5
Qwen2-72B
82.4
No
-
-
-
6
LLama-3-70B
81
No
-
-
-
7
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)
78.4
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
8
PaLM 540B (CoT + self-consistency)
78.2
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
9
code-davinci-002 175B (CoT)
73.5
No
Evaluating Large Language Models Trained on Code
2021-07-07
Code
10
Flan-PaLM 540B (3-shot, fine-tuned, CoT)
72.4
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
11
PaLM 540B (CoT)
71.2
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
12
Flan-PaLM 540B (5-shot, finetuned)
70
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
13
PaLM 540B
62.7
No
Scaling Instruction-Finetuned Language Models
2022-10-20
Code
14
Orca 2-13B
50.18
No
Orca 2: Teaching Small Language Models How to Re...
2023-11-18
-
15
Orca 2-7B
45.93
No
Orca 2: Teaching Small Language Models How to Re...
2023-11-18
-
#1
Qwen2.5-72B
86.3
Average (%)
No paper
#2
Jiutian-大模型
86.1
Average (%)
No paper
#3
LLama-3-405B
85.9
Average (%)
No paper
#4
Jiutian-57B
84.07
Average (%)
No paper
#5
Qwen2-72B
82.4
Average (%)
No paper
#6
LLama-3-70B
81
Average (%)
No paper
#7
Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)
SOTA
78.4
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#8
PaLM 540B (CoT + self-consistency)
78.2
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#9
code-davinci-002 175B (CoT)
SOTA
73.5
Average (%)
· 2021-07-07
Evaluating Large Language Models Trained on Code
Code
#10
Flan-PaLM 540B (3-shot, fine-tuned, CoT)
72.4
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#11
PaLM 540B (CoT)
71.2
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#12
Flan-PaLM 540B (5-shot, finetuned)
70
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#13
PaLM 540B
62.7
Average (%)
· 2022-10-20
Scaling Instruction-Finetuned Language Models
Code
#14
Orca 2-13B
50.18
Average (%)
· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason
#15
Orca 2-7B
45.93
Average (%)
· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason