2023 Volume 23 Pages 43-49
Proteolytic cleavage is influenced by the physicochemical properties of amino acids surrounding the cleavage site. Among these properties are 553 amino acid indices, and we considered that combining these indices with machine learning could create QSAR models for protease activity. In this study, we focused on γ-secretase, an enzyme known to be involved in the pathogenesis of Alzheimer’s disease. We created 10,680 regression models for the protease activity of γ-secretase by using 10 amino acid indices compressed from the 553 amino acid indices through principal component analysis, 12 pocket models of protease binding sites, and 89 machine learning models. We used these regression models to predict cleavage sites for 23 substrates where the cleavage sites were known and examined the amino acid property information used in the model with the highest prediction accuracy (87.0%). We found that the amino acid property information used in this model was related to the secondary structure of proteins, which may imply that it contains important information on the transmembrane cleavage of γ-secretase.