Enzyme Classification through Structural Bioinformatics and Advanced Machine Learning Algorithms
Authors: Pratham Kaushik, Kanwarpartap Singh Gill, Nitin Thapliyal and Ramesh Singh Rawat
Publishing Date: 05-11-2024
ISBN: 978-81-955020-9-7
Abstract
This study presents a novel method for enzyme categorization that combines EDA with NN models. Using a dataset of 858,777 annotated amino acid sequences from 10 different species, the model classifies enzymes in a 253,146 sample set, removing those with sequences longer than a certain threshold. X, U, B, and Z are infrequent amino acids that must be omitted during preprocessing in order to make room for B and Z, which are unique to the training set. After 20 epochs, the Neural Network architecture—which includes an embedding layer, bidirectional LSTM layers, and a dense output layer—manages to achieve an encouraging 79% accuracy on the test set. The model's effectiveness across 20 enzyme classes is demonstrated by a comprehensive classification report and confusion matrix. The importance of integrating EDA and NN in bioinformatics and molecular biology is demonstrated by this work, which enhances enzyme categorization approaches. Investigating new features and optimisation techniques to further improve the model is the next step.
Keywords
Enzyme Classification, Amino Acid Sequences, Exploratory Data Analysis, Neural Networks, Bioinformatics.
Cite as
Pratham Kaushik, Kanwarpartap Singh Gill, Nitin Thapliyal and Ramesh Singh Rawat, "Enzyme Classification through Structural Bioinformatics and Advanced Machine Learning Algorithms", In: Mukesh Saraswat and Rajani Kumari (eds), Applied Intelligence and Computing, SCRS, India, 2024, pp. 33-40. https://doi.org/10.56155/978-81-955020-9-7-4