Mathematical Formula Retrieval
MFR (Mathematical Formula Retrieval)
Textscc-by-4.0Introduced 2025-02-28
Mathematical dataset based on 71 famous mathematical identities. Each entry consists of two identities (in formula or textual form), together with a label, whether the two versions describe the same mathematical identity. The false pairs are not randomly chosen, but intentionally hard by modifying equivalent representations (see ddrg/named_math_formulas for more information). At most 400000 versions are generated per identity. There are ten times more falsified versions than true ones, such that the dataset can be used for a training with changing false examples every epoch.