MathEquiv

mathematical statement equivalence

TextsApache 2.0Introduced 2025-05-22

MathEquiv dataset is accompanied to EquivPruner . It is specifically designed for mathematical statement equivalence , serving as a versatile resource applicable to a variety of mathematical tasks and scenarios. It consists of almost 100k math sentences pair with equivalence result and reasoning step generated by GPT-4O.

The dataset consists of three splits:

  • train with 77.6k problems for training.
  • test with 9.83k samples for testing.
  • valid with 9.75k samples for validation.

We implemented a five-tiered classification system. This granular approach was adopted to enhance the stability of the GPT model's outputs, as preliminary experiments with binary classification (equivalent/non-equivalent) revealed inconsistencies in judgments. The five-tiered system yielded significantly more consistent and reliable assessments:

  • Level 4 (Exactly Equivalent): The statements are mathematically interchangeable in all respects, exhibiting identical meaning and form.
  • Level 3 (Likely Equivalent): Minor syntactic differences may be present, but the core mathematical content and logic align.
  • Level 2 (Indeterminable): Insufficient information is available to make a definitive judgment regarding equivalence.
  • Level 1 (Unlikely Equivalent): While some partial agreement may exist, critical discrepancies in logic, definition, or mathematical structure are observed.
  • Level 0 (Not Equivalent): The statements are fundamentally distinct in their mathematical meaning, derivation, or resultant outcomes.