epoch 9 pgd_25_0.1_eps
Reported on 8 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Reasoning8 results
- Average-per ques.60.25best: 95.24 (AI Core)
- Counterfactual-per opt.66.65best: 96.61 (AI Core)
- Counterfactual-per ques.25.89best: 90.72 (AI Core)
- Descriptive81.39best: 96.46 (AI Core)
- Explanatory-per opt.83.42best: 99.94 (AI Core)
- Explanatory-per ques.72.78best: 99.81 (AI Core)
- Predictive-per opt.78.5best: 95.69 (redherring)
- Predictive-per ques.60.95best: 93.96 (AI Core)