MART(multiple additive regression trees)is one of the potent classification and regression machine and it has a special characteristic point that the more we go slowly, the more we quickly reach to the destination. This is a paradox in the field of machine learning. However, if we could apply this concept to the education of chemistry, we should have much benefits for both students and teachers. Therefore, we tried both the mindmap by Tony Buzan and the panorama (expansion of thoughts diagram by Y.Hatamura+ image) in the Science Partnership Program, class A (by JST) because both methods have adopted the way for understanding the whole image by adding multiple small questions and answers or arranging the whole components to solving problems. After participants tried such methods for expressing their interpretation on oxidation-reduction or thermochemical equation and so on, we heard their opinion towards such methods. And we took relatively good reaction from them.
A vast quantity of chemicals are present in our environment and are considered indispensable to our high technological society. However, there are some chemicals that are hazardous and that can extensively impact both human health and the global environment. In Japan, ecotoxicity tests of chemical substances have been conducted with the goal of contributing to the Organization for Economic Cooperation and Development (OECD) evaluation program for harmful high-production volume (HPV) chemicals since 1995. To date, only about 500 compounds have been tested. There is a possibility that quantitative structure-activity relationships (QSARs) may enable us to predict environmental toxicities and fates as well as the physical and chemical properties of compounds; therefore, the toxicity prediction by QSARs is a possible alternative to the measurements of their ecotoxicities. In this study, we generated QSAR models from toxicity tests of Daphnids using only 3D descriptors to examine the availability of particular 3D descriptors for predicting of the ecotoxicity of compounds with various structures. Predicton accuracy of the model generated in this study was adequate and improved compared to that of the model using only the n-octanol/water partition coefficient, logP(o/w).
Prediction of cytochrome P450 (CYP) 3A4 substrates is valuable for finding promising drug candidates and a considerable amount of attention has been devoted to in silico predictions. Machine learning (ML) methods have recently been explored for perfoming ligand-based approaches. ML methods utilize supervised learning methods such as neural networks, support vector machines and Bayesian approaches to develop statistical models. In this paper, we used Bayesian approach to classify CYP 3A4 substrates and non-substrates. The extended connectivity fingerprint (ECFP) descriptor was used as chemical descriptor. The atom score was calculated from the Bayesian score of each fingerprint. By visualizing the atom scores with five graded-colors, the color mapping for each compound was performed. This can be used for chemical interpretaion why the specific compound exhibits CYP 3A4 substrate. The established Bayesian model and the associated color mapping would be useful for avoiding the risk of CYP 3A4 substrate in early drug discovery. The parallel use of the prediction of oxidation sites in the subsequent paper can give us de novo prediction of any molecules concerning CYP 3A4.
The excitation energies of uracil dimers have been calculated with the configuration interaction singles (CIS) method, the configuration interaction with single excitations with perturbed doubles (CIS(D)) method, the equation-of-motion coupled-cluster method with singles and doubles excitations (EOM-CCSD), and the time-dependent density-functional theory (TDDFT) with pure and hybrid functionals. It is shown that the charge transfer excitations behave as 1/R at large molecular distance limit and the local excited states are split by the so-called electronic couplings. The calculated excited energies by the ab initio methods (CIS, CIS(D), EOM-CCSD) well reproduce those features. But the results of TDDFT cannot show those features because of the short-sightness of exchange correlation functionals of DFT. The inclusion of the electron correlation is crucial to the excitation energies, especially to the charge transfer excitation, of which the excitation energies are decreased by more than 2 eV.
Hydrogen bonding of HCN-H2O-HCN trimer has been studied by means of ab initio molecular orbital (MO) calculations. The changes in intermolecular interaction energy and vibrational frequency induced by addition of HCN molecule to HCN-H2O or H2O-HCN dimer are especially focused. The distances of hydrogen bonds in the trimer are calculated to be shorter than those in the corresponding dimers. The hydrogen bond distances are hence shortened by addition of another HCN. The interaction energies of the hydrogen bonds are also increased by addition of HCN. The dipole moment of the trimer is smaller than the sum of the dipole moments of the separate moieties (HCN + H2O + HCN). This is opposite of the previous result for H2O-HCN-H2O. The spectral shift of the stretching modes induced by hydrogen bond formation has been predicted by vibrational frequency analysis. The vibrational frequency of the asymmetric stretching mode of HCN included in the H2O-HCN part of the trimer is remarkably red-shifted from that of the H2O-HCN dimer. This information is expected to be useful for experimental detection of the trimer.
The drug discovery process is an extremely time-consuming task. Therefore efficient methodologies for accelerating the process are highly desired. In silico drug discovery is one of the most promising techniques to accelerate the drug discovery process. Lead generation with such technologies has attracted great attention in recent years and many studies have been continuously published. In ligand-based virtual screening, the quality of pharmacophore models from potent known ligands affects the success of lead generation efforts. Many methods for producing pharmacophore models have been reported. However, they have both merits and demerits, and no solid method have been established. In this study, we propose a novel pharmacophore modeling method using a molecular alignment technology base on Hopfield neural network (HNN). In order to validate the proposed method, it is applied to phosphodiesterase-4 (PDE4) inhibitors. Pre-calculated conformers of six known inhibitors are aligned using HNN based molecular alignment method. Aligned molecules are ranked by a newly developed simple scoring function and subsequently pharmacophore models are extracted. The obtained pharmacophore models are validated using x-ray crystal structures of PDE4 and previous works. It has been demonstrated that our method successfully produces pharmacophore models for PDE4.
In quantitative structure activity relationships (QSAR), partial least squares (PLS) are of particular interest as a statistical method. Since successful applications of PLS to QSAR data set, PLS has evolved for coping with more demands associated with complex data structures. Especially, PLS variants focusing on visualization and chemical interpretation are highly desirable for molecular design. In this paper, we employed the self-organized map PLS (SOMPLS) approach to predict multiple inhibitory activities against three serine protease receptors (Factor Xa, Tryptase and urokinase-type Plasminogen Activator (uPA)). Retrosynthetic Combinatorial Analysis Procedure (RECAP) fingerprints were used as chemical descriptors that express the existence of specific substructure in the molecule. From the SOMPLS analysis and the subsequent correlation map, essential fragments for each serine protease were easily identified. From the correlation map, we designed best combinations of fragments at each substituent position for each serine protease protein. The essential fragments could be validated from X-ray crystal structures of serine protease receptors in computer graphics. SOMPLS is an unique approach that makes data-mining feasible from visualization of structure-activity data biased to ligand-based view point.
It is possible to prepare pentacene thin films using retro Diels-Alder reaction of soluble precursors. There are a variety of leaving groups for the precursors with different temperatures of conversions to pentacene. We conducted a theoretical analysis of the formation mechanism of pentacene from several reported precursors at the B3LYP/6-31G* level of theory to clarify the relationship between the conversion temperature and the leaving group. Results from our calculations showed a low activation energy results in a low temperature for the conversion. The PLS analysis confirmed that the magnitude of Ea values are well correlated with heat of reactions and energy levels of LUMO for the leaving groups. In order to confirm why conversion temperatures are dependent on solvents used for fabrications of pentacene thin films, changes in activation energies were investigated for retro Diels-Alder reactions of the precursors including MeOH. It was calculated that the activation energies for these systems were lower by 0.7 kcal mol-1 than those without MeOH molecules. This trend is consistent with that observed. The present theoretical results suggested that we can control the temperatures of the pentacene conversion by choosing proper bridging reagents as well as the solvent for thin film fabrications from the precursors.