Journal of Computer Aided Chemistry
Online ISSN : 1345-8647
ISSN-L : 1345-8647
Classification of metabolites by metabolic pathways concerning terpenoids, phenylpropanoids, and polyketide compounds based on machine learning
Yuri KoideDaiki KogeShigehiko KanayaMd. Altaf-Ul-AminMing HuangAki Hirai MoritaNaoaki Ono
Author information

2023 Volume 23 Pages 25-34


Terpenoids, phenylpropanoids, and polyketides are the majority of the secondary metabolites containing carbon, hydrogen, and oxygen. In this work, 19,769 metabolites accumulated in KNApSAcK Core DB were classified into 71 subgroups comprising three major groups (terpenoids, phenylpropanoids, and polyketides) according to scientific literatures. We represented the metabolites as molecular fingerprint including chemical properties, and used those descriptors for classification by random forest model. We found that both training and test metabolites were well classified into the subgroups, with 94.06 %, and 94.23 % accuracy, respectively. Though classification of metabolites based on metabolic pathways is very time-consuming works, machine learnings with molecular fingerprint made it possible to attain the classification. This work will lead a light for systematical and evolutional understanding of diverged secondary metabolites based on secondary metabolic pathways. Data science is an interdisciplinary and applied field that uses techniques and theories drawn from statistics, mathematics, computer science, and information science. Combining these resources data science enables extracting meaningful and practical insights for secondary metabolites.

Content from these authors
© 2023 The Chemical Society of Japan
Previous article Next article