TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Miscellaneous/General Knowledge/TheoremQA

General Knowledge on TheoremQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT-4 (PoT)52.4NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
2GPT-4 (CoT)43.8NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
3GPT-3.5-turbo (PoT)35.6NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
4DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)32.5YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
5DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)32.2YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
6PaLM-2-unicorn (CoT)31.8NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
7GPT-3.5-turbo (CoT)30.2NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
8DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)28.2YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
9DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)27.4YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
10Claude-v1 (PoT)25.9NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
11Claude-v1 (CoT)24.9NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
12code-davinci-00223.9NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
13Claude-instant (CoT)23.6NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
14text-davinci-00322.8NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
15PaLM-2-bison (CoT)21NoTheoremQA: A Theorem-driven Question Answering d...2023-05-21Code
16DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)19.4YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
17DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)17YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
18DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)16.4YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
19DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)15.4YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code