GSM8K on GSM8K

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...
#ModelAccuracyExtra DataPaperDateCode
1Xolver98.1NoXolver: Multi-Agent Reasoning with Holistic Expe...2025-06-17Code
2AlphaLLM (with MCTS)92NoToward Self-Improvement of LLMs via Imagination,...2024-04-18Code