TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/ChEBI-20

ChEBI-20

BiomedicalGraphsTextsIntroduced 2021-11-01

Dataset contains 33,010 molecule-description pairs split into 80%/10%/10% train/val/test splits. The goal of the task is to retrieve the relevant molecule for a natural language description. It is defined as follows:

To push the boundaries of multimodal models, we present a new IR task: \textbf{Text2Mol}.

Given a text query and list of molecules without any reference textual information (represented, for example, as SMILES strings, graphs, or other equivalent representations) retrieve the molecule corresponding to the query. From a text description of a molecule, the model must incorporate the information in the description into a semantic representation which can be used to directly retrieve the molecule. This requires the integration of two very different types of information: the structured knowledge represented by text and the chemical properties present in molecular graphs. We assume there is only one correct (relevant) molecule for each description, so we consider two measures for this task: Hits@1 and mean reciprocal rank (MRR).

80% of the data is used for training. Retrieval is done against the entire corpus of molecules (train, val, test).

Benchmarks

Cross-Modal Information Retrieval/Hits@1Cross-Modal Information Retrieval/Hits@10Cross-Modal Information Retrieval/Mean RankCross-Modal Information Retrieval/Test MRRCross-Modal Retrieval/Hits@1Cross-Modal Retrieval/Hits@10Cross-Modal Retrieval/Mean RankCross-Modal Retrieval/Test MRRDrug Discovery/BLEUDrug Discovery/Exact MatchDrug Discovery/Frechet ChemNet Distance (FCD)Drug Discovery/LevenshteinDrug Discovery/MACCS FTSDrug Discovery/Morgan FTSDrug Discovery/RDK FTSDrug Discovery/Text2MolDrug Discovery/ValidityDrug Discovery/Parameter CountImage Captioning/BLEUImage Captioning/ExactImage Captioning/LevenshteinImage Captioning/MACCS FTSImage Captioning/Morgan FTSImage Captioning/RDK FTSImage Captioning/ValidityImage Retrieval with Multi-Modal Query/Hits@1Image Retrieval with Multi-Modal Query/Hits@10Image Retrieval with Multi-Modal Query/Mean RankImage Retrieval with Multi-Modal Query/Test MRRMolecule Captioning/BLEU-2Molecule Captioning/BLEU-4Molecule Captioning/METEORMolecule Captioning/ROUGE-1Molecule Captioning/ROUGE-2Molecule Captioning/ROUGE-LMolecule Captioning/Text2MolText-based de novo Molecule Generation/BLEUText-based de novo Molecule Generation/Exact MatchText-based de novo Molecule Generation/Frechet ChemNet Distance (FCD)Text-based de novo Molecule Generation/LevenshteinText-based de novo Molecule Generation/MACCS FTSText-based de novo Molecule Generation/Morgan FTSText-based de novo Molecule Generation/RDK FTSText-based de novo Molecule Generation/Text2MolText-based de novo Molecule Generation/ValidityText-based de novo Molecule Generation/Parameter Count

Statistics

Papers
43
Benchmarks
46

Links

Homepage

Tasks

Cross-Modal Information RetrievalCross-Modal RetrievalDrug DiscoveryImage CaptioningImage Retrieval with Multi-Modal QueryMolecule CaptioningText-based de novo Molecule Generation