Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Inference
/
ANLI test
Natural Language Inference on ANLI test
Metric: A2 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
A2 (best first)
A2 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
A2
▼
Extra Data
Paper
Date
↕
Code
1
T5-3B (explanation prompting)
72.5
No
-
-
-
2
PaLM 540B (Self Improvement, Self Consistency)
66.5
No
Large Language Models Can Self-Improve
2022-10-20
-
3
PaLM 540B (Self Improvement, CoT Prompting)
65.3
No
Large Language Models Can Self-Improve
2022-10-20
-
4
PaLM 540B (Self Improvement, Standard-Prompting)
64.8
No
Large Language Models Can Self-Improve
2022-10-20
-
5
PaLM 540B (Self Consistency)
64.5
No
Large Language Models Can Self-Improve
2022-10-20
-
6
PaLM 2-L (one-shot)
63.4
No
PaLM 2 Technical Report
2023-05-17
Code
7
T0-11B (explanation prompting)
60.6
No
-
-
-
8
PaLM 540B (CoT Prompting)
58.9
No
Large Language Models Can Self-Improve
2022-10-20
-
9
PaLM 540B (Standard-Prompting)
55.8
No
Large Language Models Can Self-Improve
2022-10-20
-
10
ChatGPT
52.6
No
A Systematic Study and Comprehensive Evaluation ...
2023-05-29
Code
11
ALUM (RoBERTa-LARGE)
52.1
Yes
Adversarial Training for Large Neural Language M...
2020-04-20
Code
12
XLNet (Large)
50.9
Yes
XLNet: Generalized Autoregressive Pretraining fo...
2019-06-19
Code
13
InfoBERT (RoBERTa)
50.5
Yes
InfoBERT: Improving Robustness of Language Model...
2020-10-05
Code
14
RoBERTa (Large)
49.8
Yes
RoBERTa: A Robustly Optimized BERT Pretraining A...
2019-07-26
Code
15
PaLM 2-M (one-shot)
49.5
No
PaLM 2 Technical Report
2023-05-17
Code
16
PaLM 2-S (one-shot)
48.8
No
PaLM 2 Technical Report
2023-05-17
Code
17
T0-3B (CoT fine-tuned)
37.2
No
The CoT Collection: Improving Zero-shot and Few-...
2023-05-23
Code
18
Flipped-3B
37.05
No
Guess the Instruction! Flipped Learning Makes La...
2022-10-06
Code
19
KiC-770M
35
No
Knowledge-in-Context: Towards Knowledgeable Semi...
2022-10-28
-
20
RoE-3B
34.64
No
Exploring the Benefits of Training Expert Langua...
2023-02-07
Code
21
Bloomberg GPT (one-shot)
34.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
22
OPT 66B (one-shot)
34.2
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
23
GPT-3
34
Yes
Language Models are Few-Shot Learners
2020-05-28
Code
24
BLOOM 176B (one-shot)
33.8
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
25
GPT-NeoX (one-shot)
33.8
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
#1
T5-3B (explanation prompting)
72.5
A2
No paper
#2
PaLM 540B (Self Improvement, Self Consistency)
SOTA
66.5
A2
· 2022-10-20
Large Language Models Can Self-Improve
#3
PaLM 540B (Self Improvement, CoT Prompting)
65.3
A2
· 2022-10-20
Large Language Models Can Self-Improve
#4
PaLM 540B (Self Improvement, Standard-Prompting)
64.8
A2
· 2022-10-20
Large Language Models Can Self-Improve
#5
PaLM 540B (Self Consistency)
64.5
A2
· 2022-10-20
Large Language Models Can Self-Improve
#6
PaLM 2-L (one-shot)
63.4
A2
· 2023-05-17
PaLM 2 Technical Report
Code
#7
T0-11B (explanation prompting)
60.6
A2
No paper
#8
PaLM 540B (CoT Prompting)
58.9
A2
· 2022-10-20
Large Language Models Can Self-Improve
#9
PaLM 540B (Standard-Prompting)
55.8
A2
· 2022-10-20
Large Language Models Can Self-Improve
#10
ChatGPT
52.6
A2
· 2023-05-29
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Code
#11
ALUM (RoBERTa-LARGE)
SOTA
52.1
A2
· Extra Data
· 2020-04-20
Adversarial Training for Large Neural Language Models
Code
#12
XLNet (Large)
SOTA
50.9
A2
· Extra Data
· 2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Code
#13
InfoBERT (RoBERTa)
50.5
A2
· Extra Data
· 2020-10-05
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
Code
#14
RoBERTa (Large)
49.8
A2
· Extra Data
· 2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Code
#15
PaLM 2-M (one-shot)
49.5
A2
· 2023-05-17
PaLM 2 Technical Report
Code
#16
PaLM 2-S (one-shot)
48.8
A2
· 2023-05-17
PaLM 2 Technical Report
Code
#17
T0-3B (CoT fine-tuned)
37.2
A2
· 2023-05-23
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Code
#18
Flipped-3B
37.05
A2
· 2022-10-06
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Code
#19
KiC-770M
35
A2
· 2022-10-28
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
#20
RoE-3B
34.64
A2
· 2023-02-07
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Code
#21
Bloomberg GPT (one-shot)
34.4
A2
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#22
OPT 66B (one-shot)
34.2
A2
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#23
GPT-3
34
A2
· Extra Data
· 2020-05-28
Language Models are Few-Shot Learners
Code
#24
BLOOM 176B (one-shot)
33.8
A2
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#25
GPT-NeoX (one-shot)
33.8
A2
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code