TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Cross-Modal Text-Molecule Retrieval with Better Mo...

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Jia Song, Wanru Zhuang, Yujie Lin, Liang Zhang, Chunyan Li, Jinsong Su, Song He, Xiaochen Bo

2024-10-31Cross-Modal Retrievalcross-modal alignmenttext similarityContrastive LearningRetrieval
PaperPDFCode(official)

Abstract

Cross-modal text-molecule retrieval model aims to learn a shared feature space of the text and molecule modalities for accurate similarity calculation, which facilitates the rapid screening of molecules with specific properties and activities in drug design. However, previous works have two main defects. First, they are inadequate in capturing modality-shared features considering the significant gap between text sequences and molecule graphs. Second, they mainly rely on contrastive learning and adversarial training for cross-modality alignment, both of which mainly focus on the first-order similarity, ignoring the second-order similarity that can capture more structural information in the embedding space. To address these issues, we propose a novel cross-modal text-molecule retrieval model with two-fold improvements. Specifically, on the top of two modality-specific encoders, we stack a memory bank based feature projector that contain learnable memory vectors to extract modality-shared features better. More importantly, during the model training, we calculate four kinds of similarity distributions (text-to-text, text-to-molecule, molecule-to-molecule, and molecule-to-text similarity distributions) for each instance, and then minimize the distance between these similarity distributions (namely second-order similarity losses) to enhance cross-modal alignment. Experimental results and analysis strongly demonstrate the effectiveness of our model. Particularly, our model achieves SOTA performance, outperforming the previously-reported best result by 6.4%.

Results

TaskDatasetMetricValueModel
Image Retrieval with Multi-Modal QueryChEBI-20Hits@156.5Song et al.
Image Retrieval with Multi-Modal QueryChEBI-20Hits@1094.1Song et al.
Image Retrieval with Multi-Modal QueryChEBI-20Mean Rank12.66Song et al.
Image Retrieval with Multi-Modal QueryChEBI-20Test MRR70.2Song et al.
Cross-Modal Information RetrievalChEBI-20Hits@156.5Song et al.
Cross-Modal Information RetrievalChEBI-20Hits@1094.1Song et al.
Cross-Modal Information RetrievalChEBI-20Mean Rank12.66Song et al.
Cross-Modal Information RetrievalChEBI-20Test MRR70.2Song et al.
Cross-Modal RetrievalChEBI-20Hits@156.5Song et al.
Cross-Modal RetrievalChEBI-20Hits@1094.1Song et al.
Cross-Modal RetrievalChEBI-20Mean Rank12.66Song et al.
Cross-Modal RetrievalChEBI-20Test MRR70.2Song et al.

Related Papers

Transformer-based Spatial Grounding: A Comprehensive Survey2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17