FormulaNet
ImagesTextsCC-BY 4.0Introduced 2022-08-29
FormulaNet
FormulaNet is a new large-scale Mathematical Formula Detection dataset. It consists of 46'672 pages of STEM documents from arXiv and has 13 types of labels. The dataset is split into a train set of 44'338 pages and a validation set of 2'334 pages. Due to copyrights reasons, we can only provide the list of papers, which must be downloaded and processed.
Labels
- inline formulae
- display formulae
- headers
- tables
- figures
- paragraphs
- captions
- footnotes
- lists
- bibliographies
- display formulae reference number
- display formulae with reference number
- footnote reference number