TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AdvanceSplice: Integrating N-gram one-hot encoding and ens...

AdvanceSplice: Integrating N-gram one-hot encoding and ensemble modeling for enhanced accuracy

Mohammad Reza Rezvan, Ali Ghanbari Sorkhi, Jamshid Pirgazi, Mohammad Mehdi Pourhashem Kallehbasti

2024-02-17Biomedical Signal Processing and Control 2024 2Feature EngineeringData VisualizationEnsemble LearningPredictionClassificationSplice Site Prediction
PaperPDFCode

Abstract

Accurate splice site prediction is a critical challenge in genomics, essential for understanding gene expression and disease-associated mutations. Splice sites mark the boundaries between exons and introns in genetic sequences and are crucial for proper RNA splicing and protein synthesis. Splice site prediction faces challenges such as complex feature extraction and constraints in accuracy. This study introduces AdvanceSplice, a method that integrates two feature extraction approaches: N-gram One-hot Encoding and character-to numerical encoding, and employs majority voting in Ensemble Modeling. The design of AdvanceSplice is focused on utilizing diversity in feature extraction to enhance the accuracy of splice site prediction. AdvanceSplice begins with N-gram processing for feature extraction, capturing essential patterns within DNA sequences. These N-grams are then transformed into binary images using one-hot encoding, which facilitates a more effective data representation for subsequent analysis. Alongside, character-to-numerical encoding is employed to enrich the analysis. In AdvanceSplice, four of the deep learning models are specialized in processing the image-like binary representations derived from N-gram encoding, while the fifth model processes sequence information through character-to-numerical encoding. This diversified approach allows for an extensive exploration of patterns and dependencies associated with various N-gram representations and sequence-based features. The ensemble strategy of AdvanceSplice combines predictions from all five models to enhance the overall accuracy of splice site identification. Comparisons with existing models on datasets such as HS3D, Homo Sapiens, and A. Thaliana indicate that AdvanceSplice identifies splice sites more effectively, contributing to the field of genomics and bioinformatics by improving splice site prediction.

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11