Metric: # Correct Groups (higher is better)
| # | Model↕ | # Correct Groups▼ | Augmentations | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Human Performance | 1405 | Yes | Large Language Models are Fixated by Red Herring... | 2023-06-19 | Code |
| 2 | GPT-4 (3-shot) | 272 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 3 | GPT-4 (5-shot) | 269 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 4 | GPT-4 (1-shot) | 262 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 5 | GPT-4 (100-shot) | 249 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 6 | GPT-4 (0-shot) | 239 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 7 | GPT-3.5-turbo (5-shot) | 149 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 8 | GPT-3.5-turbo (3-shot) | 140 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 9 | GPT-3.5-turbo (10-shot) | 137 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 10 | GPT-3.5-turbo (1-shot) | 123 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 11 | GPT-3.5-turbo (0-shot) | 114 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |