DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, Junxian He

2024-06-18Math Math Word Problem Solving Natural Questions Mathematical Problem-Solving Arithmetic Reasoning

Abstract

Solving mathematical problems requires advanced reasoning abilities and presents notable challenges for large language models. Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate any correct response for the most challenging queries. Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. Utilizing DART, we have created new datasets for mathematical problem-solving that focus more on difficult queries and are substantially smaller than previous ones. Remarkably, our synthesis process solely relies on a 7B-sized open-weight model, without reliance on the commonly used proprietary GPT-4. We fine-tune various base models on our datasets ranging from 7B to 70B in size, resulting in a series of strong models called DART-MATH. In comprehensive in-domain and out-of-domain evaluation on 6 mathematical benchmarks, DART-MATH outperforms vanilla rejection tuning significantly, being superior or comparable to previous arts, despite using much smaller datasets and no proprietary models. Furthermore, our results position our synthetic datasets as the most effective and cost-efficient publicly available resources for advancing mathematical problem-solving.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	90.4	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	89.6	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	88.2	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	86.8	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	82.6	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	82.5	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	81.1	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	81.1	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	32.5	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	32.2	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	28.2	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	27.4	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	19.4	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	17	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	16.4	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	15.4	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)

Abstract

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	56.1	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	54.9	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	53.6	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	52.9	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	46.6	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	45.5	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	45.3	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Accuracy	43.5	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	90.4	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	89.6	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	88.2	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	86.8	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	82.6	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	82.5	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	8	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	81.1	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Accuracy	81.1	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
Arithmetic Reasoning	GSM8K	Parameters (Billion)	8	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	32.5	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	32.2	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	28.2	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	27.4	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	19.4	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	17	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	16.4	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
General Knowledge	TheoremQA	Accuracy	15.4	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Abstract

Results

Related Papers

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Abstract

Results

Related Papers