Metric: Counterfactual-per ques. (higher is better)
| # | Model↕ | Counterfactual-per ques.▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | AI Core | 90.72 | No | Think before You Simulate: Symbolic Reasoning to... | 2025-06-12 | Code |
| 2 | VRDP | 84.29 | No | - | - | - |
| 3 | redherring | 80.05 | No | - | - | - |
| 4 | neural | 75.61 | No | - | - | - |
| 5 | Fighttttt | 75.35 | No | - | - | - |
| 6 | NERV | 74.89 | No | - | - | - |
| 7 | rnn_dyn | 51.07 | No | - | - | - |
| 8 | troublesolver | 50.89 | No | - | - | - |
| 9 | v0.1 | 49.77 | No | - | - | - |
| 10 | DCL | 46.52 | No | - | - | - |
| 11 | TS_NS_IMPERIAL | 44.6 | No | - | - | - |
| 12 | First_test | 42.23 | No | - | - | - |
| 13 | epoch 9 pgd_25_0.1_eps | 25.89 | No | - | - | - |