JEMMA
TextsIntroduced 2022-12-18
JEMMA is an Extensible Java Dataset for ML4Code Applications, which is a large-scale dataset targeted at ML4 code. JEMMA comes with a considerable amount of pre-processed information such as metadata, representations (e.g., code tokens, ASTs, graphs), and several properties (e.g., metrics, static analysis results) for 50,000 Java projects from the 50KC dataset, with over 1.2 million classes and over 8 million methods.
Source: JEMMA: An Extensible Java Dataset for ML4Code Applications
Image Source: https://arxiv.org/pdf/2212.09132v1.pdf