Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Jia Song, Wanru Zhuang, Yujie Lin, Liang Zhang, Chunyan Li, Jinsong Su, Song He, Xiaochen Bo

2024-10-31Cross-Modal Retrieval cross-modal alignment text similarity Contrastive Learning Retrieval

Abstract

Cross-modal text-molecule retrieval model aims to learn a shared feature space of the text and molecule modalities for accurate similarity calculation, which facilitates the rapid screening of molecules with specific properties and activities in drug design. However, previous works have two main defects. First, they are inadequate in capturing modality-shared features considering the significant gap between text sequences and molecule graphs. Second, they mainly rely on contrastive learning and adversarial training for cross-modality alignment, both of which mainly focus on the first-order similarity, ignoring the second-order similarity that can capture more structural information in the embedding space. To address these issues, we propose a novel cross-modal text-molecule retrieval model with two-fold improvements. Specifically, on the top of two modality-specific encoders, we stack a memory bank based feature projector that contain learnable memory vectors to extract modality-shared features better. More importantly, during the model training, we calculate four kinds of similarity distributions (text-to-text, text-to-molecule, molecule-to-molecule, and molecule-to-text similarity distributions) for each instance, and then minimize the distance between these similarity distributions (namely second-order similarity losses) to enhance cross-modal alignment. Experimental results and analysis strongly demonstrate the effectiveness of our model. Particularly, our model achieves SOTA performance, outperforming the previously-reported best result by 6.4%.

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	ChEBI-20	Hits@1	56.5	Song et al.
Image Retrieval with Multi-Modal Query	ChEBI-20	Hits@10	94.1	Song et al.
Image Retrieval with Multi-Modal Query	ChEBI-20	Mean Rank	12.66	Song et al.
Image Retrieval with Multi-Modal Query	ChEBI-20	Test MRR	70.2	Song et al.
Cross-Modal Information Retrieval	ChEBI-20	Hits@1	56.5	Song et al.
Cross-Modal Information Retrieval	ChEBI-20	Hits@10	94.1	Song et al.
Cross-Modal Information Retrieval	ChEBI-20	Mean Rank	12.66	Song et al.
Cross-Modal Information Retrieval	ChEBI-20	Test MRR	70.2	Song et al.
Cross-Modal Retrieval	ChEBI-20	Hits@1	56.5	Song et al.
Cross-Modal Retrieval	ChEBI-20	Hits@10	94.1	Song et al.
Cross-Modal Retrieval	ChEBI-20	Mean Rank	12.66	Song et al.
Cross-Modal Retrieval	ChEBI-20	Test MRR	70.2	Song et al.

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Abstract

Results

Related Papers

Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment

Abstract

Results

Related Papers