Metric: Average-per ques. (higher is better)
| # | Model↕ | Average-per ques.▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | AI Core | 95.24 | No | Think before You Simulate: Symbolic Reasoning to... | 2025-06-12 | Code |
| 2 | redherring | 91.14 | No | - | - | - |
| 3 | VRDP | 90.24 | No | - | - | - |
| 4 | Fighttttt | 88.71 | No | - | - | - |
| 5 | neural | 88.27 | No | - | - | - |
| 6 | NERV | 88.05 | No | - | - | - |
| 7 | DCL | 75.52 | No | - | - | - |
| 8 | troublesolver | 73.3 | No | - | - | - |
| 9 | v0.1 | 73.1 | No | - | - | - |
| 10 | First_test | 69.65 | No | - | - | - |
| 11 | TS_NS_IMPERIAL | 69.21 | No | - | - | - |
| 12 | rnn_dyn | 67.57 | No | - | - | - |
| 13 | epoch 9 pgd_25_0.1_eps | 60.25 | No | - | - | - |