Chemoinformatics is the application of informatics methods to solve chemical problems. Although this term was introduced only a few years ago, this field has a long history with its roots going back more than 40 years. These different origins have now merged into a discipline of its own that is full of activities. All areas of chemistry from analytical chemistry to drug design can benefit from chemoinformatics methods. And there are still many challenging chemical problems waiting for solutions through the further development of chemoinformatics.
Biomass is an important resource that has been focused on as one of the alternatives for petroleum. For the realization of the sustainable society, development of chemical processes using biomass resources as chemical materials is expected. The methodology of the automatic generation of synthetic routes to a given target compound from carbohydrates(monosaccharides) is described. Each monosaccaride or its hydrate is converted into various compounds by dehydration, dehydrogenation or hydrogenation. The compounds are converted into tautomers(same molecular formula) by steps of tautomerization. Each tautomer is converted into carbocycles with the same formula by cyclization. The above-mentioned operation is repeated until the molecular formula of compounds sets is in agreement with that of the target compound. The synthetic routes from monosaccarides to the target compound are detected by exploration and matching operation on a Linux computer using our Pascal program.
We have started to develop a High Resolution Molecular Spectroscopy literature DataBase: HRMSDB. A part of the HRMSDB has been opened to the public at RIO-DB (Research Information Database) of the National Institute of Advanced Industrial Science and Technology, Tsukuba from January 2005. (http://www.aist.go.jp/RIODB/hrmsdb/index.html) HRMSDB comprises references which report the results of high resolution molecular spectroscopic studies and also of some from related areas such as reaction dynamics, astronomy, atmospheric chemistry, plasma science, and ab initio calculations. High resolution means that the rotational structure is resolved, but the actual coverage of the data is somewhat broader. About 20,000 references published since the early 1950's are collected by one of the authors (EH), and the acquisition of new publications will continuously be made to update the database from time to time. Each record in this database includes the following items: the record identification number, the chemical formula of the molecule (or atom) under consideration, the title and the author(s) of the paper, the name of the journal where it was published, along with the volume number, page(s), and year, and a few keywords. The users can retrieve any word(s) in these items and also derive a KWIC list. Some principal molecular constants, the spectroscopic methods employed, and others worth mentioning are available.
The phagocyte NADPH oxidase complex plays a crucial role in host defense against microbial infection through the production of reactive oxygen species. Key to the activation of NADPH oxidase is cytoplasmic subunit p47phox, which includes tandem SH3 domains and the polybasic region (Figure 1). Recently, the crystal structures of the active and inactive states of p47phox were determined, conformational change that mediates these two structures remains to be elucidated. Our simulations revealed that phosphorylations of Ser303, 304, and 328, which are important for activation of p47phox, contribute to structural changes in the region that is isolated from these serine residues. Additionally, it is concluded that the ligand exchange of p47phox in activation of NADPH oxidase is induced by interaction between the membrane subunit p22phox and N-terminal SH3 domain of p47phox that was exposed to solvent by phosphorylations.
In this study, we report the synthesis of bis(4-methoxyphenyl)methanofullerene and the separation of trans-1 and e isomers. The number of adducts was determined by FAB MS, while the MALDI-TOF MS spectrum reveals a ‘ghost peak' one adduct larger by mass (Figure 7). The observed and calculated (ZINDO/S) absorption spectra of the separated isomers were similar in the 300-400nm range. The difference between isomers is appearing in the 500-550nm range, where a broad band for e isomer is observed and calculated (Figure 9). It appears that the nature of this addend did not affect the transitions. The qualitative yields of bis-adduct isomers were predicted by LUMO of the mono-adduct and confirmed by experimental results. In the same way, the qualitative yields of tri-adduct were predicted and we found that the common tri-adduct derivatives of trans-1 and e isomers are scarce. Additionally, we report a new program tool to create the structure file of any multi-adduct coordinates (Figure 2).
We propose four models for water purification in a river that flows in a big city area. All models are discrete expressions. They are named a simple expression-, plus dam's effects-, plus underground penetration-, and pair unknown coefficient-models. Using a neural network, we analyze changes of the water quality in a virtual river defined by the models. The objective is to test the simulation-ability of the discrete expressions, and to discuss the possibility of the inverse prediction of the model. If we could predict the inverse operation, we estimate the water purification of a river on use of a data set only. The neural network is a useful tool to analyze non-linear phenomena. Discrete data set is required in the analysis, which includes observations and descriptor data. The neural network has ability to emulate the phenomena through learning iterations. Defects in the set suspend the iteration. The defects are classified into three cases. The first is that the existence of defect elements is certain but the value is unknown. The second is loss of whole data for a descriptor. The third is uncertainness for the existence of a descriptor. The first case is called "defect", and it was studied recently. However, the latter two were not. It is necessary to discuss the latter two for researches of environmental problems. At the same time, they are important for significance-tests of outputs of the neural networks. We researched neural-network functions on the latter two cases, which are for multi-regression analysis. The main point is to evaluate limits of the functions. The statistical characters of the error are not clear; therefore, to simplify the research, we consider no-error cases. Thus, we define a virtual river whose data are constructed by the uniform random numbers. The defect part is made in the data set on purpose. The researches show the following; 1. A neural network outputs reasonable water quality in a river, even if there is a defect descriptor. 2. The partial derivatives don't indicate accurate descriptor characters, when the target descriptor is not defect one. 3. The cause is another descriptor makes up for the defect; i.e., there are interactions among descriptors. 4. The largest change of whole partial derivatives indicates the complement descriptor. 5. It is possible to calculate characters of the defect descriptor approximately. 6. The possibility is enabling when outlines of phenomena are known. Thus, the discrete expression of a river makes changes of the water quality calculable in general cases, and by the inverse calculations, we can predict the descriptive equation of phenomena by using observation data only.
In the field of molecular design or drug design, efficient methodologies with computer systems have been desired and various computer programs have been investigated. In our laboratory, we are researching chemoinformatics and chemometrics, and have been developing computer programs for these analyses. The ToMoCo (Total System for Molecular Designs by the Computer Chemistry Laboratory) is a total system for molecular design developed in our laboratory by integrating these in-house programs. The ToMoCo has functions that are related to quantitative structure-activity relationships such as CoMFA (comparative molecular field analysis), molecular structure alignment method using Hopfield neural network, region selection method by GARGS (genetic algorithm-based region selection), and automatic structure generation method by LigConstructor. In this paper, we describe these functions and the result of QSAR analysis of Cyclooxygenase-2 (COX-2) inhibitors using the ToMoCo.