Qwen2(CoT + Code Interpreter)
Reported on 4 benchmarks across 4 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Knowledge Base2 results
- Execution Accuracy92.3best: 93.9 (GPT-4 (Teaching-Inspired))
- Execution Accuracy92.3best: 93.9 (GPT-4 (Teaching-Inspired))
Natural Language Processing1 result
- Execution Accuracy92.3best: 93.9 (GPT-4 (Teaching-Inspired))
Reasoning1 result
- Execution Accuracy92.3best: 93.9 (GPT-4 (Teaching-Inspired))