2023 年 71 巻 7 号 p. 486-494
Computational approaches to drug development are rapidly growing in popularity and have been used to produce significant results. Recent developments in information science have expanded databases and chemical informatics knowledge relating to natural products. Natural products have long been well-studied, and a large number of unique structures and remarkable active substances have been reported. Analyzing accumulated natural product knowledge using emerging computational science techniques is expected to yield more new discoveries. In this article, we discuss the current state of natural product research using machine learning. The basic concepts and frameworks of machine learning are summarized. Natural product research that utilizes machine learning is described in terms of the exploration of active compounds, automatic compound design, and application to spectral data. In addition, efforts to develop drugs for intractable diseases will be addressed. Lastly, we discuss key considerations for applying machine learning in this field. This paper aims to promote progress in natural product research by presenting the current state of computational science and chemoinformatics approaches in terms of its applications, strengths, limitations, and implications for the field.
Natural products are rich sources of useful bioactive substances and unique compounds derived from living organisms.1) Traditionally, medicinal resources or their extracts have been used as folk medicines in many parts of the world.2) As analytic and isolation techniques have improved, specific bioactive compounds have been identified from natural medicinal resources, and further optimization of chemical modification has led to the development of numerous effective drugs3–5) (Fig. 1). Approximately half of the small-molecule drugs approved up to 2019 are natural products, drugs with pharmacophores derived from natural products, or natural product mimics.4) In addition, structures based on natural products were more prevalent among top-selling drugs in 2018 than in 2006, which demonstrates an increasing interest in natural products research for clinical applications.6) Natural products have also been beneficial in the development of drugs aiming to treat intractable diseases. For example, fingolimod, a drug used to treat multiple sclerosis, was developed based on a secondary metabolite from Isaria sinclairii.7,8) Cancer remains a primary concern in drug discovery and chemical research, and approximately 65% of anti-tumor drugs are structurally based on natural products.4) As these facts indicate, natural products play an important role in drug development.

Currently, the development of new drugs based on natural products by the pharmaceutical industry is in decline due to the significant costs, time, and effort required.5) Research on natural products that aims to identify useful compounds for developing novel drugs is typically carried out as follows,1,2,5,9) First, organic materials (plants, microorganisms, marine organisms, etc.) are extracted with solvents to make crude extracts, which serve as aggregates of the material’s constituents. The crude extracts are then divided into fractions based on the physicochemical characteristics of their constituents. This fractionation is performed in several steps using open column chromatography and LC to promote compound purity, and pure compounds can be isolated after repeated separation. Bioassay-guided separation, which is a method for isolating compounds based on the concentration of biological activity in specific fractions, allows researchers to selectively obtain target bioactive compounds. The structures of the obtained compounds are determined by a combination of spectroscopic analyses, such as NMR, mass spectroscopy, X-ray crystallography, and circular dichroism (CD) spectral analysis.
Fractionation methods have led to the discovery of compounds with novel structures, including the incidental discovery of highly bioactive compounds. However, several problems have been identified with conventional methods. From a bioactivity perspective, researchers often encounter mixtures of known compounds with relatively low individual activity after applying separation techniques.9,10) Another problem arises when searching for compounds with novel structures. An advantage of natural products is that they provide uncommon and unique structures. However, the more specific the structure, the more difficult it is for researchers to estimate its bioactivity. With this background, drug development based on natural products has been replaced by combinatorial chemistry, owing to inadequate efficiency and certainty. However, a relatively small proportion of newly developed drugs are produced using combinatorial chemistry techniques.4)
In recent years, artificial intelligence (AI) has led to remarkable progress in many fields. In the field of drug discovery, chemoinformatics research utilizing AI and statistical analysis has yielded successful findings. These developments in computational science techniques can be attributed to the popularization of high-performance computational environments and the rapid expansion of available chemical data. Information processing technology based on AI and metrics can facilitate research over a wide scope, including the discovery of potentially useful active compounds and an overview of the target compound group. Today, a growing number of natural products studies applying computational science technologies are being conducted. In natural product chemistry, AI can provide new prospects for utilization of natural products, such as efficiently finding active compounds, and accelerating the process of isolating compounds and other applications.
This review discusses the progress of machine learning and computational techniques in natural products research. First, a basic overview of data-driven methods, which conducted on the basis of results derived by using big data and algorithms, and their characteristics is presented. Second, recent advances in data-driven research are highlighted. Finally, important considerations regarding computational science techniques and the limitations of data use are addressed.
Various approaches to chemical information-based methods have been employed and described in detail in previous studies.11,12) Machine learning is a technique that iteratively learns large amounts of known data to identify relevant rules and applies these rules using new data to predict outcomes. Namely, machine learning predicts objective variables, such as bioactivity, from explanatory variables, such as structural information.
Machine learning can be categorized into supervised or unsupervised learning depending on the purpose and format of the data (Fig. 2). In supervised learning, both the objective variable (target information, e.g. bioactivity, physicochemical properties, affinity, absorption, distribution, metabolism, excretion (ADME)) and explanatory variable (or input variables: information used to predict target, e.g. chemical structure, amino acid sequence, experimental condition) are used to find patterns between explanatory and objective variables to construct predictive models. Briefly, data (explanatory variables–objective variable pairs) are first collected to create training dataset and test dataset. Next, the training dataset is trained using any desired algorithm to construct a predictive model. The predictive model is then validated on the test data to check its performance (Fig. 3). Supervised machine learning techniques can be further categorized as either regression or discrimination. Regression techniques are those in which the objective variable is a continuous numeric value to predict the outcome of specific values (e.g., IC50 and EC50 values). Discriminant techniques are those in which the objective variable is categorical data, which is then used to classify the target into categories (e.g., bioactive or inactive). As an example of supervised learning, a Quantitative Structure–Activity Relationship (QSAR) study with activity as the objective variable and structure as the explanatory variable elucidates the relationship between the two. In unsupervised learning, characteristics and similarities are found only from the input variables without objective variables. Because training is performed without setting the target data (object variables), it is necessary to interpret the results, but the essential characteristics of the input variables (explanatory variables) should be represented. This approach provides summarization, description, and similarity of the input data. In machine learning, you can select any training method (algorithm) for the predictive model. Various algorithms, such as support vector machine, random forest, K-nearest neighbor, and deep learning, can be employed to make good predictions.13,14)

A) Supervised learning: using the objective variables (e.g. activity), find the relationship between the input data (e.g. structure information) and the objective variable. B) Unsupervised learning: no objective variable is set and only the input data (e.g. structure information) is used to find rules for summarizing or grouping.

Datasets are constructed from existing databases and experimental results. A subset of the dataset is used to build the model. The constructed model is then validated by the remaining test data for its performance (validation). After validation assures performance, new data can be used to make predictions.
To recognize the chemical structures in machine learning, quantifying structures is generally necessary. Molecular descriptors and fingerprints are often used as explanatory variables to apply structural information in machine learning. Molecular descriptors represent the structures of compounds based on their physicochemical properties or substructures. By using molecular descriptors as explanatory variables, we can estimate relationship between physicochemical properties and target information (e.g., bioactivity or affinity). Molecular fingerprints, which quantify the chemical structure pattern based on the presence or absence of each fragment in a set of substructures, are the most basic descriptors of chemical structures.15) These techniques can be used to visualize chemical spaces,16) determine similarities, and extract structural features for specific properties.6)
To construct a good machine learning model, training data should be generated from good data sources. It is essential to use a source with a wide distribution of information and large amount of data. PubChem17) and ChEMBL18,19) are well-known sources of compound information; however, natural products coverage is limited.20) In the last 20 years, the number of natural product–specific databases has grown rapidly, and such databases have been summarized in several reviews.21–24) In particular, the Dictionary of Natural Products25) and Supernatural II26) are leading natural product databases that report on a large number of natural products along with their activity information. The KNApSAcK family database utilizes species–metabolite relationships, provides multidimensional information on the spectrum of traditional medicinal properties, and details ingredients of crude drugs and foods worldwide.27) Sorokina et al. developed the COlleCtion of Open Natural prodUcTs (COCONUT) as a universal repository to unify information on natural products scattered in various databases.21) It is hoped that the expansion of databases reporting natural product information will lead to the development of more diverse informatics approaches.
Understanding natural products and their substructures have led to the development of many innovative drugs such as paclitaxel, irinotecan, and artemisinin (Fig. 1). The structures of natural products are diverse, and certain number of unique structures have been reported.28–31) The structural advantages of natural products in drug development have been identified by analyses of molecular descriptors as well as other methods.5,6,32,33)
Specifically, natural products cover a wider chemical space than combinatorial molecules. A comparison of the chemical space of drugs, combinatorial chemistry compounds, and natural products revealed that the chemical space of drugs and natural products was similar, while the chemical space of synthetic compounds was found to be smaller than those of both drugs and natural products.6,32,33) The structures of natural products typically have greater molecular weights, higher numbers of sp3 carbons, fewer halogens and sulfur, more abundant H bond acceptors/donors, lower lipophilicity, greater rigidity, larger polar surface areas, and larger total surface areas than compounds developed by combinatorial chemistry. Notably, natural products contain more oxygen atoms and fewer nitrogen atoms in each molecule than synthetic compounds. In addition, natural products have more stereocenters and fused rings but fewer aromatic rings and rotatable bonds, suggesting a more rigid and less planar structure.5,6,32,33) As described above, the structural characteristics of natural products are beneficial in drug development.
Meanwhile, natural products differ from drugs by frequency of ring fusion and the abundance of chiral carbons. These characteristics causes some natural products to violate the “rule of five” (RO5). The RO5 is a well-known empirical principle for assessing the oral bioavailability of compounds34) and functions as a useful benchmark because it allows chemists to intuitively estimate compounds suitable for drug development. However, it has been proposed that some drugs and drug candidates are effective despite not satisfying the RO5, which is called “beyond RO5” (bRO5).35–37) Many natural products are known to be bRO5s. Overall, natural products have a different structurally attractiveness from synthetic compounds from combinatorial chemistry as drug candidate. Natural products and their structural modifications to them are considered to be beneficial for drug development.38)
Throughout its long history, natural product research has revealed a large number of unique and excellent bioactive compounds. Today, it is noted that although many remarkable results have been achieved, the discovery of structurally unique compounds has been slowing down.39,40) The effective use of chemical information can expand the scope of possibilities for traditional natural product research and encourage new discoveries. In this section, we describe recent studies that have used machine learning to employ chemical information in (i) active compound discovery, (ii) spectral and chromatogram data applications, and (iii) automatic design of structures resembling those of natural products.
4.1. Identification of Active Compounds in Natural Products Using Machine LearningNatural resources contain numerous bioactive compounds; however, the identification of bioactive compounds in complex mixtures is laborious. Aggregates of weakly active compounds are often observed in bioassay-guided separation. Considering these obstacles, the effective use of known information and AI can accelerate the identification and utilization of bioactive compounds.14)
Virtual screening is commonly used to calculate the biological activity of compounds. Virtual screening typically falls into the following categories: structure-based studies, which are based on molecular recognition of target proteins and ligand compounds, and ligand-based studies, which are based on ligand compounds and its physicochemical properties and substructures.41,42) Structure-based study can estimate the active compound based on its binding mode and binding stability to the target protein. However, it is often computationally intensive, requiring detailed information on the protein and binding pocket. It is also necessary to consider whether the compound is an agonist or antagonist. Ligand-based studies consider the commonality of chemical structures of active compounds to predict activity, even if the target protein is unknown.42) However, it is difficult to discover structurally distinct active compounds from known active compounds because it is assumed that new active compounds are structurally similar to known active compounds. The two types of methods are used according to the data available and the purpose of the project.
Bioactivity is attributed to the chemical structure of a compound. Therefore, extensive computational research has been conducted to estimate biological activity from information on the chemical structures. Ligand based research on active compounds is based on the assumption that the chemical structures of active compounds are similar. Therefore, the most basic method for identifying bioactive compounds is to calculate their structural similarity of candidate compounds to known active compounds.43) However, there are a wide variety of similarity metrics and molecular fingerprints, and the number of possible combinations is enormous. A study that proposes a combination of fingerprints and metrics that shows robustness and good evaluation capability for multiple compound collections may be helpful in overcoming this limitation.43)
Another method for determining bioactivity is the QSAR study wherein a mathematical model is used to determine the correlation between chemical structure and activity. Regression models are widely used to predict specific activity values from known information on chemical structures such as molecular descriptors and fingerprints.44–46) This method covers the relationship between relatively confined groups of compounds and specific targets. Discriminant models have been employed to predict whether a compound will exhibit activity or not, and are often applicable to compounds with more diverse structures.47–51) In addition to these ligand-based approaches, various structure-based approaches employing information on target proteins have also successfully estimated biological activity.49,52,53) Liang et al. aimed to identify covalently bound natural products against PLK1, which is associated with cell proliferation.52) They estimated the druggable pocket of PLK1 and used a structure-based virtual screening to determine covalently bound natural products and candidate herbs containing them. As a result, baicalein and baicalin from Scutellaria baicalensis were successfully revealed that they covalently bound to the target. Rodrigues et al. proposed abietane-type diterpenes with potential activity against 2019-new coronavirus by examining the results of ligand-based machine learning models together with the results of structure-based molecular docking.49) Wright et al. examined three-dimensional (3D)-structure–activity relationships with receptor modeling methodologies addition to the anti-infective and cytotoxicity assays of furanones from Delisea pulchra (cf. fimbriata). They reported that the experimental and computational pharmacophore hypotheses are in agreement.53)
Recently, there have been reports estimating what bioactivity a compound will exhibit without defining target bioactivity. Stone et al. examined the relationship between the structures and therapeutic activity potentials by clustering approach, which is one of the non-supervised learning. Drugs derived from natural products were grouped according to their chemical structures and labeled based on therapeutic group.6) Cockroft et al. developed machine learning models to predict the target proteins of natural products with excellent accuracy.54) They cross-referenced 20 natural product databases with the ChEMBL database, obtaining 5589 compound-target protein pairs to construct their target protein predictive model. The stacking model, which combines two prediction model (k-nearest neighbor and logistic regression), dramatically improved the predictive ability of natural product–protein relationships. Their best model is available as a web application tool called “STarFish”. Determining the bioactivity of compounds has traditionary depended on empirical estimation, which has been a barrier to research progress. An increase in studies employing such non-targeted computational approaches would greatly contribute to drug discovery research.
In addition to chemical structures, various types of information are useful for predicting bioactivity. Genomic information of the source organism has been shown to be powerful tool for predicting the bioactivity of natural products. Walker and Clardy reported that direct learning from the biosynthetic gene clusters of microorganisms can predict the antibiotic activity of natural products using machine learning.55) They first generated a dataset of biosynthetic gene clusters and biological activities of known compounds, and represented biosynthetic gene clusters as vectors of the number of times a particular gene appeared. Then by learning biosynthetic gene clusters and bioactivities, a machine learning model was constructed to predict bioactivities from genomic data. The effective prediction of compound bioactivity is important for drug development, and combining multiple types of information and techniques is expected to yield more comprehensive and reliable results.
4.2. Spectral Analysis and Machine LearningThe determination of chemical structures remains a complicated and difficult process in the study of natural products. Structural determination of natural products is usually conducted for pure substances by a combination of analytical methods (e.g., 1D and 2D-NMR, mass spectrometry, IR absorption spectroscopy, UV-visible absorption spectroscopy, optical rotation measurement, circular dichroism analysis, and X-ray crystal structure analysis) with special expertise, which makes it highly difficult and potentially costly.56,57) Many recent studies have attempted to perform structure estimation of compounds using machine learning.
Simulations to predict NMR parameters of natural products have been continuously studied and we can easily access them in some databases and software such as SciFinder. Although there are deviations between predicted and measured values for some unique structures, the overall accuracy is improving.56) As other attempts to predict NMR parameters, a program for detecting the misattribution of chemical shifts to structures using neural networks was developed.58) A program was also developed to determine the compound class of natural products by importing 13C-NMR data into a supervised machine learning model.59) In addition, attempts to estimate the structure of compounds from the MS/MS data of mixtures have also been carried out.60,61) CANOPUS uses a deep neural network to determine compound classes with high accuracy, even for compounds without structural reference data.60)
As spectral data contain abundant information, they can be combined with machine learning to provide a great variety of insights. One key problem in the utilization of natural products is that the bioactivity of their extracts from natural resources is subject to change owing to the heterogeneity of their constituents, even within the same species. Maraschin et al. used machine learning on propolis NMR data to select appropriate samples with assured homogeneities.62) Floros et al. combined LC-tandem mass spectrometry (LC-MS/MS)-based metabolomics with molecular network analysis to inventory compounds for multiple marine microorganisms, leading to the discovery of novel compounds.63) As these studies show, the combination of spectral analysis and machine learning can also be adapted to mixtures, which could greatly contribute to the early focus on target compounds. In summary, the application of machine learning to spectral data is adaptable to both pure compounds and mixtures. Although it is currently difficult to automatically predict the whole structure of a natural product owing to the presence of more stereocenters than synthetic compounds, the accuracy of structure determination is expected as analytic technology improves.
4.3. Automated de Novo Design of Natural Products-like CompoundsAs described previously, the chemical structures of natural products have long provided insights for drug design. Synthesizing compounds with reference to the chemical structures or substructures of natural products can yield different characteristics than those of completely synthetic compounds and expand their chemical space. These approaches can also produce compounds with diverse bioactivities and targets.64)
Recently, computational design of compounds was actively studied to enrich chemical libraries and broaden chemical space. The major strategies are broadly divided into building block method and autoencoder.65) Building blocks means fragments with specific functional groups or substructures. In the building block approach, those fragments were automatically combined to generate compounds. Since the generated compounds are composed of existing fragments, most of them are synthesizable. Autoencoder is another type of de novo chemical generator that utilizes neural network.66) Autoencoder learns features from the input data and generates new data similar to the input data by reproducing those features. While this method has the strong advantage of generating completely new structures, it also generates chemically invalid or difficult-to-synthesize structures. Gómez-Bombarelli et al. developed chemical variational autoencoder (Chemical VAE) to generate various chemical structures by training large number of chemical structures.67) Chemical VAE is publicly available on their GitHub page and is a useful tool for easily generating structures.
Computational design of new compounds with the characteristics of natural products has been also actively studied.64,67,68) Grigalunas designed 244 pseudo-natural products composed of fragments of natural product-substructures. Their chemoinformatics analysis revealed that the pseudo-natural products were mostly represented in the area where drugs and natural products intersect, and these fragment combinations may not occur in nature.64) Zheng et al. developed a quasi-biogenic molecule generator for natural product-like structure generation,69) which can reproduce stereochemical structures unique to natural products by applying a recurrent neural network. Although the synthesizability and stability need to be checked, automated de novo design provides new possibilities for drug discovery. There have been reports that de novo designed compounds have actually shown the desired biological activity.70) Merk et al. found several natural product-like active compounds for retinoid X receptor (RXR) and peroxisome proliferator-activated receptor (PPAR). They have incorporated natural products characteristics into a deep neural network trained by large amount of active synthetic compounds for RXR and PPAR, and successfully, designed compound had both a natural product-like structure and the desired activity. In other reports, Harada et al. reported that they used two methods (Chemical VAE and similarity search) for structural design to search for UV-resistance molecules and found that using chemical VAE efficiently found compounds that exhibited UV-resistance.71) These new structure generation method is expected to build virtual natural product libraries and accelerate the identification and optimization of pharmaceutical leads.
In the automated de novo design of chemical structures, metrics for evaluating natural product likeness have been developed.72–74) A number of drug-likeness metrics have been developed to assess the drug potential of candidate chemicals as well.75,76) In addition, programs that determine whether the input chemical structures belong to natural products have been developed.77,78) Although chemical databases contain a vast number of compounds, their ontologies follow basic chemical structures, and the classification of compounds based on natural product skeletons requires expert knowledge and significant effort. The NPClassifier is a deep learning tool for the automated structural classification of natural products that has demonstrated high accuracy.77) Such a program can automate natural product classification and is expected to accelerate natural product research in a wide range of fields, including the discovery of bioactive substances and structure generation. Although there remains some improvements, de novo molecular design can suggest structures beyond the scope of chemists’ imaginations. Combining the technology of automated chemical design with the structures of natural products is expected to facilitate the discovery of unexplored chemical spaces.
4.4. Machine Learning Approaches to Developing Therapeutics for Intractable Diseases from Natural ProductsNatural products have provided solutions to diseases that were previously difficult to treat, such as galantamine for Alzheimer’s disease (AD) or fingolimod for multiple sclerosis. Intractable diseases are generally chronic processes, and their pathogenetic mechanisms are not fully understood due to generally small patient populations and complex pathologies. In general, the development of therapeutic drugs for intractable diseases is difficult because of the large amount of uncertain information. Using computational processing, it is possible to find informative patterns from previous information so that candidate compounds can be selected on the basis of evidence, even for intractable diseases.79,80) As an example, we focus on studies on AD.
AD is a chronic and progressive neurodegenerative disease. As the leading cause of dementia, AD has been recognized by WHO as a global public health priority. Despite the large gains in our understanding of AD pathogenesis, there are still not established fundamental treatments.81,82)
To date, natural products, such as galantamine, have shown significant value in the development of drugs for AD. Recent computational studies have focused on natural products to identify new candidate drug compounds for AD.83–85) Herrera-Acevedo et al. computationally searched for potential acetylcholinesterase (AChE) inhibitors from secondary metabolites. After narrowing down candidate compounds using a machine learning model, virtual screening and molecular docking calculations were performed to identify two promising sesquiterpene lactones.83) Jeyasri et al. used Bacopa monieri, traditionally considered a nootropic and memory enhancer, to search for therapeutically useful compounds and elucidate their molecular mechanisms. They generated a network of target proteins and constituents of Bacopa monieri, and the network of target proteins and diseases, proposing possible interactions and biological processes from the system pharmacological and chemoinformatics approach.84) Ambure et al. used a discriminant model to search for natural products that act on multiple targets in AD, which is a multifactorial disease.85) These studies suggest that the combination of natural products research and informatics can be an effective solution for diseases for which therapeutic agents are inadequate. However, such efforts are still in the developmental stage, and many diseases have not yet been investigated. We hope that more active research will lead to the development of useful medicines for diseases that have historically been difficult to treat.79)
Although machine learning has provided valuable insights and produced excellent results for drug discovery, its limitations must be considered as the improper use of machine learning can lead to erroneous results. This section introduces the limitations and primary considerations for machine learning techniques in this field.
In the dataset construction for building the predictive model, we should know that public databases have limited coverage of inactive compounds.86) Cleves and Jain stated that structure–inactivity relationships are as important as structure–activity relationships. Structure–inactivity relationships are used to highlight the structural features of active compounds and validate in silico methods, which are expected to be useful in improving the efficiency of virtual screening. Therefore, the expansion of inactive data in databases is desirable for advancing chemoinformatics-based drug discovery research.87)
Overfitting, which means that the mathematical model fits the training data too precisely is a fundamental concern in machine learning. In overfitting model, model matches the training data too closely and it becomes less generalizable to new test data to be predicted. Algorithms that prevent overfitting have recently been developed and utilized. However, it has been reported that chemical data have potential redundancies and biases that can lead to overfitting, resulting in optimistic prediction results.86,88,89) Thus, the contents of training dataset should be carefully considered beforehand.90)
Certain algorithms, such as random forest, can calculate the importance of variables (e.g. structural descriptors) allowing us to know the good physicochemical properties for drug. In this case, a predictive model must be constructed using interpretable structural descriptors. While deep learning is excellent in terms of prediction accuracy, it should be noted that often the reasoning behind the predictions is a black box.90,91) To counter this, interpretable deep learning models have been developed.92–94)
The prediction results of machine learning depend strongly on the data used for training. Therefore, the reliability of the prediction becomes questionable if biased data are used.41) In addition, data that deviate substantially from the training data cannot be predicted because they are beyond the applicability domain of the model. Thus, it is important to verify that the data to be predicted are within the applicable domain of the model before making predictions to prevent inaccurate predictions. Moreover, if the model is constructed using compounds with too similar structures, it may be possible to predict a specific group of compounds, but the range of applicability will be generally limited.41) However, it should also be taken into account that in structurally diverse datasets, compounds are likely to have different binding modes and sites and can influence predictions on experiment.
Although various machine learning techniques are now readily available, it is essential to ensure reproducibility. It is also important to confirm that the results of the predictions and analyses are consistent with experimental results. Moreover, ensuring transparency of the predictive model is important for promoting validity and applicability. Therefore, code and dataset disclosure are required.95,96)
The benefits that machine learning can bring to drug discovery research are enormous. However, its use requires sufficient knowledge, and care must be taken to ensure accuracy and applicability of all findings. The results obtained by machine learning should be used based on strong expertise in each specialized area.
This review outlined AI approaches to natural product–based drug development and validated that natural products have significant potential as drug resources. Various computational approaches can be employed to identify relationships of structures and activities that were previously inconceivable. By incorporating computational approaches into natural products research, it is expected that we can promote natural product-based drug discovery and explore new chemical spaces. Information processing techniques can yield varied results, and it is important to note that the results may involve uncertain inferences. Therefore, the results obtained from these techniques must be thoroughly examined for validity from the perspective of natural product chemistry. In addition, the results obtained by computerized prediction may conflict with experimental results. It is recommended that experimental results are fed back into computational prediction models to improve their practicality. Creative research on natural products is expected to develop through the application of AI, which has significant implications for the discovery of novel compounds and development of drugs for intractable diseases.
This work was supported in part by JSPS KAKENHI (Grant number 21K15325).
The authors declare no conflict of interest.