TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unifying Molecular and Textual Representations via Multi-t...

Unifying Molecular and Textual Representations via Multi-task Language Modelling

Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther, Teodoro Laino, Matteo Manica

2023-01-29Multi-Task LearningText-based de novo Molecule GenerationLanguage ModellingMolecule Captioning
PaperPDFCode(official)

Abstract

The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. Our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.

Results

TaskDatasetMetricValueModel
Drug DiscoveryChEBI-20BLEU85.3Text+Chem T5-augm base
Drug DiscoveryChEBI-20Exact Match32.2Text+Chem T5-augm base
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.05Text+Chem T5-augm base
Drug DiscoveryChEBI-20Levenshtein16.87Text+Chem T5-augm base
Drug DiscoveryChEBI-20MACCS FTS90.1Text+Chem T5-augm base
Drug DiscoveryChEBI-20Morgan FTS75.7Text+Chem T5-augm base
Drug DiscoveryChEBI-20Parameter Count220000000Text+Chem T5-augm base
Drug DiscoveryChEBI-20RDK FTS81.6Text+Chem T5-augm base
Drug DiscoveryChEBI-20Validity94.3Text+Chem T5-augm base
Drug DiscoveryChEBI-20BLEU81.5Text+Chem T5-augm small
Drug DiscoveryChEBI-20Exact Match19.1Text+Chem T5-augm small
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.06Text+Chem T5-augm small
Drug DiscoveryChEBI-20Levenshtein21.78Text+Chem T5-augm small
Drug DiscoveryChEBI-20MACCS FTS86.4Text+Chem T5-augm small
Drug DiscoveryChEBI-20Morgan FTS67.2Text+Chem T5-augm small
Drug DiscoveryChEBI-20Parameter Count60000000Text+Chem T5-augm small
Drug DiscoveryChEBI-20RDK FTS74.4Text+Chem T5-augm small
Drug DiscoveryChEBI-20Validity95.1Text+Chem T5-augm small
Drug DiscoveryChEBI-20BLEU75Text+Chem T5 base
Drug DiscoveryChEBI-20Exact Match21.2Text+Chem T5 base
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.061Text+Chem T5 base
Drug DiscoveryChEBI-20Levenshtein27.39Text+Chem T5 base
Drug DiscoveryChEBI-20MACCS FTS87.4Text+Chem T5 base
Drug DiscoveryChEBI-20Morgan FTS69.7Text+Chem T5 base
Drug DiscoveryChEBI-20Parameter Count220000000Text+Chem T5 base
Drug DiscoveryChEBI-20RDK FTS76.7Text+Chem T5 base
Drug DiscoveryChEBI-20Validity79.2Text+Chem T5 base
Drug DiscoveryChEBI-20BLEU73.9Text+Chem T5 small
Drug DiscoveryChEBI-20Exact Match15.7Text+Chem T5 small
Drug DiscoveryChEBI-20Frechet ChemNet Distance (FCD)0.066Text+Chem T5 small
Drug DiscoveryChEBI-20Levenshtein28.54Text+Chem T5 small
Drug DiscoveryChEBI-20MACCS FTS85.9Text+Chem T5 small
Drug DiscoveryChEBI-20Morgan FTS66Text+Chem T5 small
Drug DiscoveryChEBI-20Parameter Count60000000Text+Chem T5 small
Drug DiscoveryChEBI-20RDK FTS73.6Text+Chem T5 small
Drug DiscoveryChEBI-20Validity77.6Text+Chem T5 small
Molecule CaptioningChEBI-20BLEU-262.5Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20BLEU-454.2Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20METEOR64.8Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20ROUGE-168.2Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20ROUGE-254.3Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20ROUGE-L62.2Text+Chem T5-augm-Base
Molecule CaptioningChEBI-20BLEU-258Text+Chem T5-Base
Molecule CaptioningChEBI-20BLEU-449Text+Chem T5-Base
Molecule CaptioningChEBI-20METEOR60.4Text+Chem T5-Base
Molecule CaptioningChEBI-20ROUGE-164.7Text+Chem T5-Base
Molecule CaptioningChEBI-20ROUGE-249.8Text+Chem T5-Base
Molecule CaptioningChEBI-20ROUGE-L58.6Text+Chem T5-Base
Molecule CaptioningChEBI-20BLEU-256Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20BLEU-447Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20METEOR58.8Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20ROUGE-163.8Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20ROUGE-248.8Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20ROUGE-L58Text+Chem T5-augm-Small
Molecule CaptioningChEBI-20BLEU-255.3Text+Chem T5-Small
Molecule CaptioningChEBI-20BLEU-446.2Text+Chem T5-Small
Molecule CaptioningChEBI-20METEOR58.3Text+Chem T5-Small
Molecule CaptioningChEBI-20ROUGE-163.3Text+Chem T5-Small
Molecule CaptioningChEBI-20ROUGE-248.1Text+Chem T5-Small
Molecule CaptioningChEBI-20ROUGE-L57.4Text+Chem T5-Small
Text-based de novo Molecule GenerationChEBI-20BLEU85.3Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Exact Match32.2Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.05Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Levenshtein16.87Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20MACCS FTS90.1Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Morgan FTS75.7Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Parameter Count220000000Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20RDK FTS81.6Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20Validity94.3Text+Chem T5-augm base
Text-based de novo Molecule GenerationChEBI-20BLEU81.5Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Exact Match19.1Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.06Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Levenshtein21.78Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20MACCS FTS86.4Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Morgan FTS67.2Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Parameter Count60000000Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20RDK FTS74.4Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20Validity95.1Text+Chem T5-augm small
Text-based de novo Molecule GenerationChEBI-20BLEU75Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Exact Match21.2Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.061Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Levenshtein27.39Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20MACCS FTS87.4Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Morgan FTS69.7Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Parameter Count220000000Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20RDK FTS76.7Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20Validity79.2Text+Chem T5 base
Text-based de novo Molecule GenerationChEBI-20BLEU73.9Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Exact Match15.7Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Frechet ChemNet Distance (FCD)0.066Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Levenshtein28.54Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20MACCS FTS85.9Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Morgan FTS66Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Parameter Count60000000Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20RDK FTS73.6Text+Chem T5 small
Text-based de novo Molecule GenerationChEBI-20Validity77.6Text+Chem T5 small

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16