2024 Volume 17 Pages 27-32
Protein-compound interaction prediction is an important problem in drug discovery. Numerous machine learning methods have been proposed using protein sequences and compound structures as features. Several methods have used biological network information as additional features including protein-protein interactions and compound bioactivities. However, previous studies have only used network data from mammals such as human and mouse. Here we develop a new method for protein-compound interaction prediction that uses features learned from the relationships between microorganisms and secondary metabolites in nature (microbial chemical communication network; MCCN). We used node2vec representation learning to extract compound features from the MCCN, and deep canonical correlation analysis (CCA) to obtain the features for compounds not included in the MCCN. By incorporating these MCCN-derived features into an existing protein-compound interaction prediction method, we showed that prediction performance was improved in several benchmark experiments. We also discussed how to improve our method by incorporating microbiome co-occurrence information into the MCCN.