A Dataset for Relation Extraction of Natural-Products

A curated evaluation dataset for end-to-end Relation Extraction of relationships between organisms and natural-products

BiomedicalTextsCreative Commons Attribution Share Alike 4.0 InternationalIntroduced 2023-11-10

A curated evaluation dataset for end-to-end Relation Extraction of relationships between organisms and natural-products.

Details about the manual annotation:

  • For Chemicals:

    • The chemical labels are annotated as they appear in the abstract.

    • In abstracts, singular chemicals and classes of chemicals produced by a specific organism were distinguished.

    • The "type" attribute {“chemical”, “class”} is used to indicate the nature of the mentioned name.

    • A "class" attribute for chemical entities has also been included if class information is present in the abstract.

    • A Wikidata and PubChem identifiers were assigned to chemicals and classes when available.

  • For Organisms:

    • The organism labels are annotated as they appear in the abstract.

    • If in an abstract, the genus name was mention first, e.g. "Plakinastrella sp." and then the specie name e.g "Plakinastrella clathrata" is precise, then only the specie name is used.

    • A Wikidata identifier was assigned to all organisms.

    • In some abstracts, only the genus name is mentioned.

  • For Relations:

    • Only the relations explicitly mentioned in the abstract are reported in the output labels.

    • Relations are reported in their order of appearance in the abstract.