Proceedings of the Symposium on Chemoinformatics

Poster Session

Ternary crystal structure prediction from composition formula with machine learning method

Nobuyuki Miuchi, Kenichi Tanaka, Hiroshi Nakano, Masakazu Ukita, Raku ...

Pages 1P22-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

One of the big challenges in materials science is to predict crystal structures for given composition formulae and also to determine a chemical/physical property that has a great influence on the crystal structures in the prediction procedure. Here we propose a novel ternary crystal structure prediction methodology using machine learning method based on the features calculated from nine chemical/physical properties. Our method successfully predicted the crystal system from the composition formulae with the accuracy of 0.682. We also investigated which chemical/physical property is influential in predicting crystal system. The contribution rate of features calculated by Random Forest and the reconstructed models without the features related to a property revealed that stoichiometry, especially the rate of composition, is essential in crystal system prediction. Since this method does not use any first principles calculation, it is expected to shorten the time of prediction, leading to the short screening material candidates and accelerate material design.

View full abstract

Download PDF (163K)
A trial to investigate various quantities of molecules using machine learning

Ryoko Hayashi

Pages 1P23-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this study, we attempted automatic classification of boiling point and melting point using data mining. This paper introduces the results obtained by using decision trees. In this abstract, only chemical substances consisting of carbon, oxygen and hydrogen are used. The descriptor directly uses the number of characteristic partial structures of the chemical substance. A decision tree that classifies boiling points and a decision tree that classifies melting points were created using 264 molecular data. As a result of predicting the boiling point and melting point by inputting 30 test data to the decision tree, the boiling point was estimated to be around RMSE34.7 and the absolute value of the error was 74.5K at the maximum, while the melting point had the maximum error absolute value of 156K. The melting point was difficult to classify and predict.

View full abstract

Download PDF (418K)
Electronic-structure simulation on hydrocarbon decomposition reaction in FAU zeolite

Takuro Kutsunugi, Manabu Sugimoto

Pages 1P24-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Fluid catalytic cracking (FCC) is a method for obtaining a high value-added component from heavy oil, and uses a zeolite, which is a porous material, as a catalyst. This time, we focused on hydrocarbon decomposition reaction in FCC. We investigated whether this chemical reaction can be reproduced by molecular dynamics calculation by DFTB method. An arbitrary number of Al atoms and H atoms are introduced into the faujasite-type crystal structure, and Bronsted acid points generated by the H atoms and Lewis acid points generated by H2O desorption from the surface are formed on the pore surface. Using the model, the reaction process of hydrocarbon molecules at various temperatures was investigated, and the decomposition reaction proceeded only when the Lewis acid site was included. Moreover, the difference of the decomposition product by reaction temperature was also confirmed. It was found that this decomposition reaction proceeds with the hydride abstraction reaction at the Lewis acid site.

View full abstract

Download PDF (295K)
Machine–Learning–Assisted synthesis for Efficiently of Synthesis Condition for Metastable Novel Metal–Organic Frameworks (MOFs) Using Failed Experiments

Yu Kitamura, Emi Terado, Daisuke Tanaka

Pages 1P25-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Lanthanide based Metal–Organic Frameworks (Ln–MOFs) have been widely studied for luminescence and sensor application. It is well known that synthesis of metal complexes with multi components have been suffered from many crystal polymorphisms depending on the reaction environment because various reaction intermediates also exist under the synthesis condition and the synthetic process using the same reactants. Moreover, Ln–MOFs generally provide many crystal polymorphisms because lanthanide ions show many flexible coordination geometry, resulting in isostructural crystal structure regardless of different metal ions. Therefore, it is difficult to selectively synthesize crystal polymorphs, and synthesis guidelines have not been established. In this study, we try to extract the dominate factors of the synthesis by using two machine learning techniques. We performed solvothermal synthesis by using four kinds of lanthanide metal ions and terephthalic acid and examined the various reaction condition. As a result, we have found an unknown phase 1 when cluster analysis was conducted on synthesis results, and the synthesis condition was explored by using decision tree. We found that the reaction condition to synthesize unknown phase 1 showed strong dependency on reagent purity.

View full abstract

Download PDF (670K)
Integrating experimental and simulation data via preference learning

Xiaolin Sun, Zhufeng Hou, Masato Sumita, Ryo Tamura, Koji Tsuda

Pages 1P26-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Practical experimental values usually provide high-accuracy results while computational simulation values usually connect to abundant low-accuracy results. We present a model on the basis of preference learning to integrate multi-fidelity data for a better prediction performance by adding more low accuracy data as a supplement of insufficient experimental data. However, in general, experimental and simulation data cannot be directly compared, and this is difficult task. Gaussian process preference learning and Bayesian optimization constitute the framework. To evaluate the performance of integration, several datasets of bandgap in oxides and wavelength in molecules from different sources have been trained, predicted and evaluated by the ranking and optimization iteration results. Compared to the models trained with only experimental data preference, models adding simulation data preference information lead to an improvement of performance, which confirms our primary hypothesis. Moreover, we have explored and discussed about how the quality of simulation data influence the prediction performance.

View full abstract

Download PDF (845K)
Exploration for organic anode materials for lithium ion batteries using materials informatics

Takumi Komura, Yasuhiko Igarashi, Hiromichi Numazawa, Hiroaki Imai, Yu ...

Pages 1P27-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Our research group has focused on experiment-oriented materials informatics (MI) based on small experimental data and machine learning combined with experience and intuition of researchers. The present work applies MI to explore organic anode materials of lithium ion batteries (LIBs). The design strategies for high performance organic anode for LIBs are not fully studied. 16 important factors of specific capacity were extracted from 24 explanatory variables using machine learning. Furthermore, 3 important factors were extracted from the 16 factors by the chemical relevance. This capacity prediction model A with the 3 descriptors assisted efficient exploration of new compounds with specific capacity. However, the prediction accuracy was not sufficient for exploration of new compounds with high-specific capacity. The improved prediction model B was constructed on the larger training dataset. The accuracy of the prediction model B was studied using the data in previous works for high-performance organic anode.

View full abstract

Download PDF (911K)
Materials-informatics-assisted lateral-size control of surface-modified nanosheets

Ryosuke Mizukuchi, Yasuhiko Igarashi, Gentoku Nakata, Hiroaki Imai, Yu ...

Pages 1P28-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Materials informatics (MI) has been studied in recent years. In typical MI, machine learning based on large-scale data and calculations is utilized to construct a prediction model for exploration of new materials and functions. However, MI has not been fully applied to small experimental data. In our research group, high-yield synthesis of nanosheets was achieved by application of MI to small-scale data. The nanosheets are obtained by exfoliation of layered compounds modified with interlayer organic molecules in nonpolar organic media. Here we applied MI to predict and control the lateral size of nanosheets. The prediction model was prepared by sparse modeling on minimax concave penalty (MCP) combined with experience and intuition of researchers on the training dataset including measured yields. Based on the prediction model, we predicted lateral-size of nanosheets of 120 unknown guests-medium combinations prior to the experiments. The larger and smaller nanosheets were actually obtained according to the prediction. The prediction model can be applied to a variety of exfoliation systems to obtain the nanosheets with the controlled size.

View full abstract

Download PDF (517K)
Size-distribution control of nanosheets based on prediction by materials informatics.

Yuri Haraguchi, Yasuhiko Igarashi, Gentoku Nakada, Hiroaki Imai, Yuya ...

Pages 1P29-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Our research group has developed experiment-oriented materials informatics, which conbines small-scale experimental data with experience and intuition by researchers. The nanosheets are obtained by exfoliation of layered organic-inorganic composite of host titanate layers and guest organic molecules in nonpolar organic media. Our intention here is to construct the prediction model for size-distribution control of the resultant nanosheets. The nanosheets were synthesized from the layered composites with 104 different combinations of guest molecules and dispersion media. The value of nanosheet size-distribution was measured as the objective variable by dynamic light scattering (DLS) analysis. Potential parameters, totally 33, related to the size-distribution, such as physical properties, calculated values, and experimental data were set as explanatory variables. The descriptors were extracted by combination of a machine learning and chemical relevance. The prediction model was constructed by this sparse modeling. The prediction accuracy was validated. Then, the size-distribution of unknown guest-medium combinations was predicted prior to the experiment. The control of narrow and wide size-distribution was actually achieved in a minimum number of experiments.

View full abstract

Download PDF (414K)
Classification QSAR with Vanishing Kernels and a single parameter

Berenger Francois, Yoshihiro Yamanishi

Pages 1P30-
Published: 2019
Released on J-STAGE: October 22, 2019

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this poster, a technique to rank-order molecules is shown. Being able to rank-order molecules allows to perform classification QSAR. The technique described here needs a single parameter (called kernel bandwidth) to be optimized during model training. In essence, Kernel Density Estimates (KDE) are revisited and used to rank-order molecules. In order to make KDE compute-efficient, two modifications are proposed: i) using ``vanishing kernels''; i.e. kernel functions with a bounded support and ii) using the Tanimoto distance between chemical fingerprints as a radial basis function in order to work in one dimension. We call this modified construction “Vanishing Ranking Kernels”. Equipped with this construction and using two real- world datasets, one from toxicology and one from High Throughput Screening (HTS) experiments, we show that Ranking Kernels can compete in performance with a state of the art deep-learning implementation for molecules. Ranking Kernels are conceptually simple. They only require a single parameter to be optimized and hence are fast to train. Once trained, they also define a Boolean Applicability Domain (AD), for free. In our experiments, this AD allows to screen candidate molecules at least 69% faster compared to not using an AD.

View full abstract

Download PDF (139K)

Register with J-STAGE for free!