Data mining techniques such as machine learning have greatly advanced the chemical and biological sciences. Especially, technological advances in data mining are anticipated for analyzing big data derived from biological and environmental systems. From this perspective, we analyze the complex metabolic and microbial responses of human skin and the relations among these responses using advanced data mining techniques. To this end, metabolic profiles of human sweats were characterized via multiple NMR spectra, followed by an advanced analytical strategy based on data-driven and machine learning approaches. These methods extracted the important variables of the metabolites associated with microbial community variations. Moreover, the relation between the sweat metabolites and the skin microbes was successfully visualized by correlation-based networks. This analytical strategy promises a versatile and useful approach for big data analyses in various fields of science.
In recent years, consumers’ interest in health foods has increased significantly. Among these health foods, fermented foods are used traditionally in Japanese food culture and have contributed to the maintenance of people's health. Recently, the biological effects of fermented brown rice and rice bran by Aspergillus oryzae (FBRA) have been comprehensively studied, and inhibitory effects on carcinogenesis have been reported. Regarding the bioactive chemical constituents in FBRA, the involvement of ferulic acid on the biological activity has been reported. In this study, we quantitatively investigated the dependence on fermentation time of the production of ferulic acid and related compounds in FBRA. In addition, we analyzed the generation of aroma-active compounds by fermentation.
In a previous paper, we analyzed the amounts of ferulic acid and its derivatives produced in the fermentation of brown rice and rice bran by Aspergillus oryzae (FBRA). Ferulic acid and its derivatives are considered to be biologically active constituents in FBRA and the amounts of these compounds increase remarkably depending on the fermentation time. Another benefit of fermentation is that it is considered to increase the nutritional value of the food. In this study, we examined changes in the nutritional components, such as dipeptides and the free forms of water soluble vitamins, in FBRA using LC-MS analysis.
Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.
Modern world is incorporating highly connected heterogeneous data due to information sharing through computer and communication technology. These data lead to a complex relation where drilling down and mining are needed for understanding the actual meaning of data. Today any modern computational technique uses graph clustering as a sophisticated technology for data analysis. In this paper we implement a generalized graph clustering algorithm DPClusO with easy operating procedure and clear visualization techniques. DPClusO is enhanced version of DPClus algorithm where overlapping property of clusters is taken into consideration along with density and periphery tracking. User can select different parameters and visualization attributes to render cluster set, single cluster, hierarchical graph etc. and save these data in image and text formats. This paper discusses step by step operation of the proposed software tool using an example network of metabolites collected from KNApSAcK database. This tool successfully generated cohesive groups of structurally similar metabolites. The tool can be used for analysis of network data of any field of studies.
It has long been investigated and understood that centrality of proteins in the context of protein-protein interaction (PPI) networks are related to their essentiality. In the present work, we validate the relations between essentiality of yeast proteins and their centrality measures in a PPI network by following a different approach using the concept of the receiver operating characteristic (ROC) curve. We found that all centrality measures are related to essentiality. However, the degree centrality performed better in case of the data we used. By deeply examining different centrality values of yeast proteins we find that they are not highly correlated, which has leaded us to hypothesize that centralities might have some relations with gene/protein functions. Indeed, we found that many of the clusters generated based on the pattern of centrality values are rich with similar function proteins. Different types of centrality values imply different types of importance of a node in a network and the functions of genes are of various types. In the present work, we hypothesized that important genes of different functions may tend to show different patterns of centralities and here we show some preliminary links between groups of similar function genes and profiles of centrality values. The concepts of network biology discussed in this paper are applicable to other networks including networks of chemical compounds.
The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.