MathEquiv
mathematical statement equivalence
TextsApache 2.0Introduced 2025-05-22
MathEquiv dataset is accompanied to EquivPruner . It is specifically designed for mathematical statement equivalence , serving as a versatile resource applicable to a variety of mathematical tasks and scenarios. It consists of almost 100k math sentences pair with equivalence result and reasoning step generated by GPT-4O.
The dataset consists of three splits:
trainwith 77.6k problems for training.testwith 9.83k samples for testing.validwith 9.75k samples for validation.
We implemented a five-tiered classification system. This granular approach was adopted to enhance the stability of the GPT model's outputs, as preliminary experiments with binary classification (equivalent/non-equivalent) revealed inconsistencies in judgments. The five-tiered system yielded significantly more consistent and reliable assessments:
- Level 4 (Exactly Equivalent): The statements are mathematically interchangeable in all respects, exhibiting identical meaning and form.
- Level 3 (Likely Equivalent): Minor syntactic differences may be present, but the core mathematical content and logic align.
- Level 2 (Indeterminable): Insufficient information is available to make a definitive judgment regarding equivalence.
- Level 1 (Unlikely Equivalent): While some partial agreement may exist, critical discrepancies in logic, definition, or mathematical structure are observed.
- Level 0 (Not Equivalent): The statements are fundamentally distinct in their mathematical meaning, derivation, or resultant outcomes.