Metric: # Solved Walls (higher is better)
| # | Model↕ | # Solved Walls▼ | Augmentations | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Human Performance | 285 | Yes | Large Language Models are Fixated by Red Herring... | 2023-06-19 | Code |
| 2 | GPT-4 (5-shot) | 7 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 3 | GPT-4 (0-shot) | 6 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 4 | GPT-4 (3-shot) | 5 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 5 | GPT-4 (1-shot) | 4 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 6 | GPT-4 (100-shot) | 3 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 7 | GPT-3.5-turbo (5-shot) | 2 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 8 | GPT-3.5-turbo (10-shot) | 2 | Yes | GPT-4 Technical Report | 2023-03-15 | Code |
| 9 | GPT-3.5-turbo (3-shot) | 0 | Yes | - | - | Code |
| 10 | GPT-3.5-turbo (1-shot) | 0 | Yes | - | - | Code |
| 11 | GPT-3.5-turbo (0-shot) | 0 | Yes | - | - | Code |