TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Knowledge-Design: Pushing the Limit of Protein Design via ...

Knowledge-Design: Pushing the Limit of Protein Design via Knowledge Refinement

Zhangyang Gao, Cheng Tan, Stan Z. Li

2023-05-20RetrievalProtein DesignWord Sense Disambiguation
PaperPDFCode(official)

Abstract

Recent studies have shown competitive performance in protein design that aims to find the amino acid sequence folding into the desired structure. However, most of them disregard the importance of predictive confidence, fail to cover the vast protein space, and do not incorporate common protein knowledge. After witnessing the great success of pretrained models on diverse protein-related tasks and the fact that recovery is highly correlated with confidence, we wonder whether this knowledge can push the limits of protein design further. As a solution, we propose a knowledge-aware module that refines low-quality residues. We also introduce a memory-retrieval mechanism to save more than 50\% of the training time. We extensively evaluate our proposed method on the CATH, TS50, and TS500 datasets and our results show that our Knowledge-Design method outperforms the previous PiFold method by approximately 9\% on the CATH dataset. Specifically, Knowledge-Design is the first method that achieves 60+\% recovery on CATH, TS50 and TS500 benchmarks. We also provide additional analysis to demonstrate the effectiveness of our proposed method. The code will be publicly available.

Results

TaskDatasetMetricValueModel
Word Sense DisambiguationTS50Sequence Recovery %(All)30.3SPIN
Protein DesignCATH 4.2Perplexity3.46Knowledge-Design
Protein DesignCATH 4.2Sequence Recovery %(All)60.77Knowledge-Design
Protein DesignCATH 4.2Perplexity4.55PiFold
Protein DesignCATH 4.2Sequence Recovery %(All)51.66PiFold
Protein DesignCATH 4.2Perplexity4.61ProteinMPNN
Protein DesignCATH 4.2Sequence Recovery %(All)45.96ProteinMPNN
Protein DesignCATH 4.2Perplexity5.36GVP
Protein DesignCATH 4.2Sequence Recovery %(All)39.47GVP
Protein DesignCATH 4.2Perplexity6.05GCA
Protein DesignCATH 4.2Sequence Recovery %(All)37.64GCA
Protein DesignCATH 4.2Perplexity6.3AlphaDesign
Protein DesignCATH 4.2Sequence Recovery %(All)41.31AlphaDesign
Protein DesignCATH 4.2Perplexity6.4StructGNN
Protein DesignCATH 4.2Sequence Recovery %(All)35.91StructGNN
Protein DesignCATH 4.2Perplexity6.63GraphTrans
Protein DesignCATH 4.2Sequence Recovery %(All)35.82GraphTrans
Protein DesignCATH 4.3Perplexity6.17GVP-large
Protein DesignCATH 4.3Sequence Recovery %(All)39.2GVP-large
Protein DesignCATH 4.3Perplexity6.44ESM-IF
Protein DesignCATH 4.3Sequence Recovery %(All)38.3ESM-IF

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15