Handling of Highly Symmetric Molecules for Chemical Structure Elucidation in a CAST/CNMR System

Shungo KOICHI; Hiroyuki KOSHINO; Hiroko SATOH

doi:10.2477/jccj.2015-0067

Abstract

In chemical structure elucidation, symmetric structures sometimes need careful handling. In the course of the development of a system for NMR-based structure elucidation, we have developed a method for processing chemical structures with high symmetry by applying an efficient algorithm for graph inference. In the present letter, we describe the outline of the algorithm by taking bibenzyl as an example. The method has been implemented to CAST/CNMR, a system for NMR-based structure/chemical shift prediction.

1 INTRODUCTION

We have been developing CAST/CNMR, a data-centric system for chemical structure analysis based on NMR data [1,2,3,4,5,6,7]. CAST/CNMR consists of two modules for prediction of ¹³C-NMR chemical shifts from a chemical structure (CShift-Predictor) and for structure elucidation from¹³C-NMRchemical shift values (CAST/CNMR Structure Elucidator). These classical topics in chemoinformatics have been intensively studied in the last couple of decades [8,9,10,11,12,13,14,15,16,17,18,19,20]. The CAST/CNMR system has been developed for 20 years. In the context of the analysis of chemical structures with complex stereochemistry, such as natural organic compounds, we have been motivated to develop a system that can handle complex structures sufficiently for practical use. Starting with a development of an original coding method for chemical structures, CAST [1,2,3], we have developed the CAST/CNMR system [1,2,3,4,5,6,7] with a ¹³C-NMR chemical shifts–chemical structure database described by using the CAST notation. The database consists of 5,413 organic compounds including mainly natural products and their synthetic intermediates. It contains 91,530 carbons. We have revised some reported chemical structures by applying CAST/CNMR [21,22,23]. The CAST notation is designed to give a canonical code for a chemical structure, which consists of notations of a topological connectivity, absolute/relative configuration, and conformational structures. The stereochemical information is described by using dihedral angles. In order to apply CAST/CNMR to more various molecular systems, we have developed a method for handling highly symmetric molecules in CAST/CNMR Structure Elucidator. In this present article, we will give a brief description about the new method as well as the application to bibenzyl for an instance.

2 FRAGMENT-BASED STRUCTURE ELUCIDATION

The computing procedure for the structure elucidation in CAST/CNMR Structure Elucidator is what follows:

Step I. Search substructures whose assigned chemical shifts corresponds to a part of the observed ¹³C-NMR chemical shifts.

Step II. Construct a candidate structure by merging the substructures retrieved in step I on the basis of their maximal common parts.

For example, when three substructures in Figure 1 are retrieved in step I, the rightmost structure is constructed in step II [7].

Figure 1.

Fragment-based structure elucidation procedure. The rightmost structure is elucidated by assembling the three substructures.

The elucidation, however, might fail when a correct structure consists of multiple same substructures or a recursive structure. For example, consider bibenzyl in Figure 2. Although the search result depends on the contents of the database, the substructures retrieved in step I can mostly contain a part of a phenyl group (A) and a structure containing an ethylene group (B). In this case, two substructures A must be assembled with B to form bibenzyl by identifying the same colored edges. However, since the procedure merges substructures at the maximal common part, the second substructure A turns out to completely overlap with the first one. Hence, structure C, consisting only of the blue lines of the bibenzyl structure, is eventually elucidated.

Figure 2.

An unsuccessful case of elucidation. Due to the symmetric structure, one of the A structures cannot be used correctly.

In order to solve this problem, we have utilized an efficient algorithm for inferring a graph from path frequency proposed by Nagamochi [24] in the field of combinatorial optimization.

3 GRAPH INFERENCE PROBLEM

A graph inference problem is a question to infer a graph with vertex labels, from the number of vertices associated with each of the labels and the number of edges between vertices associated with each pair of labels. That means, given those numbers, called required numbers, one infers a graph consistent with the numbers. When applying the algorithm proposed in [24] to a chemical structure, it is treated as a chemical graph with vertices (atoms) and edges (bonds). The vertex label corresponds to an atom type or more technically the CAST code regarding the topological connectivity [1]. A bond type is not considered. For example, Figure 3 shows the required numbers of a chemical graph of phenol. If the required numbers are given, then the graph inference algorithm can be used to elucidate a structure.

Figure 3.

The required numbers for a chemical graph of phenol.

4 METHOD AND APPLICATION

To show the concept of the new method, we give a demonstration that the bibenzyl structure is elucidated by the method from C in Figure 2. The correct structure with chemical shift assignment is shown in Figure 4 (i). The input is a set of query ¹³C NMR chemical shift (C-shift) values (141.8 ( × 2), 128.5 ( × 4), 128.4 ( × 4), 125.9 ( × 2), and 38.0 ( × 2) in ppm) with a tolerance parameter α = 1.0 ppm, which is used for classifying chemical shift data. The elucidation is performed in the following procedure:

Figure 4.

Structure elucidation of bibenzyl by using the graph inference algorithm. (i) The correct structure with observed ¹³C NMR chemical shift (C-shift) assignment in ppm. (ii) The elucidated structure with C-shift values assigned based on the data retrieved from the structure-spectrum database.

(1) Group the query chemical shifts according to α. In this case, we obtain the following groups Q₁, Q₂, Q₃, and Q₄.

Q₁ = {141.8, 141.8},

Q₂ = {128.5, 128.5, 128.5, 128.5, 128.4, 128.4, 128.4, 128.4},

Q₃ = {125.9, 125.9}, Q₄ = {38.0, 38.0}

(2) Similarly, group the chemical shift values of structure C (141.9 ( × 2), 129.1 ( × 4), 128.8 ( × 4), 126.0 ( × 1), and 37.4 ( × 2) in ppm), which are retrieved from the database, also according to α. However, in this step, if two carbon atoms are in different structural classes even though their C-shift values are in the same group, they are assigned to different groups. As a result, we obtain the following groups C₁, C₂, C₃, C₄, and C₅.

C₁ = {141.9, 141.9}, C₂ = {129.1, 129.1, 129.1, 129.1},C₃ = {128.8, 128.8, 128.8, 128.8}, C₄ = {126.0}, C₅ = {37.4, 37.4}

(3) Based on the similarity of chemical shift values, some groups obtained in (2) are united. In this case, C₂ and C₃ are merged to give group C₆. To each group of those obtained in (2), assign the groups obtained in (1) according to α. As a result, we obtain the following pairs of groups.

(a) Q₁ and C₁, (b) Q₂ and C₆, (c) Q₃ and C₄, (d) Q₄ and C₅

(4) For each group obtained in (3), calculate the number of carbon atoms needed to satisfy the set of the query chemical shifts. Concretely, if Q_i corresponds to C_j, the required number of each of the carbon atoms in C_j is defined to be |Q_i|/|C_j|. Regarding pair c (Q₃ and C₄), the carbon atom with 126.0 ppm must be duplicated.

(5) According to the required numbers for atoms determined in (4), calculate the required numbers for edges. For example, the required numbers for the edges between carbon atoms with 141.9 and 129.1 ppm, respectively, is four. For the pair of 126.0 and 128.8 ppm, the required number is calculated to be four, since the carbon atoms with 126.0 ppm are duplicated in the previous step.

(6) Now we are ready to apply the graph inference algorithm. Using the required numbers calculated in (4) and (5), a structure with C-shift assignment shown in Figure 4 (ii) is elucidated.

5 CONCLUSION

We have developed a new method to handle molecules with high symmetry in CAST/CNMR Structure Elucidator by applying a graph inference algorithm. The concept was demonstrated using bibenzyl. The new elucidator will be applied also to more complex structures.

We would like to mention that we have been preparing to open the CAST/CNMR program package to the public. Until then, contact any of the present authors, if interested in trying the program.

S.K. was partly supported by Nanzan University Pache Research Subsidy I-A-2 for the 2014 academic year.

References

[1] H. Satoh, H. Koshino, J. Chem. Inf. Comput. Sci., 40, 622 (2000).
[2] H. Satoh, H. Koshino, K. Funatsu, T. Nakata, J. Chem. Inf. Comput. Sci., 41, 1106 (2001).
[3] H. Satoh, H. Koshino, T. Nakata, J. Comput. Aided Chem., 3, 48 (2002).
[4] S. Koichi, S. Iwata, T. Uno, H. Koshino, H. Satoh, J. Chem. Inf. Model., 47, 1734 (2007).
[5] H. Satoh, H. Koshino, J. Uzawa, T. Nakata, Tetrahedron, 59, 4539 (2003).
[6] H. Satoh, H. Koshino, T. Uno, S. Koichi, S. Iwata, T. Nakata, Tetrahedron, 61, 7431 (2005).
[7] S. Koichi, M. Arisaka, H. Koshino, A. Aoki, S. Iwata, T. Uno, H. Satoh, J. Chem. Inf. Model., 54, 1027 (2014).
[8] J. Lederberg, G. L. Sutherland, B. G. Buchanan, E. A. Feigenbaum, A. V. Robertson, A. M. Duffield, C. Djerassi, J. Am. Chem. Soc., 91, 2973 (1969).
[9] B. G. Buchanan, E. A. Feigenbaum, Artif. Intell., 11, 5 (1978).
[10] K. Funatsu, S. Sasaki, J. Chem. Inf. Comput. Sci., 36, 190 (1996).
[11] W. Bresmer, Angew. Chem. Int. Ed. Engl., 27, 247 (1988).
[12] W. Bremser, M. Grzonka, Mikrochim. Acta, 104, 483 (1991).
[13] http://www.molgen.de (accessed November 25th, 2015)
[14] http://www.acdlabs.com (accessed November 25th, 2015)
[15] http://www.bio-rad.com (accessed November 25th, 2015)
[16] H. Masui, H. Hong, J. Chem. Inf. Model., 46, 775 (2006).
[17] M. E. Elyashberg, Y. Z. Karasev, E. R. Martirosian, H. Thiele, H. Somberg, Anal. Chim. Acta, 348, 443 (1997).
[18] M. E. Elyashberg, K. A. Blinov, E. R. Martirosian, Lab. Autom. Inf. Manage., 34, 15 (1999).
[19] K. A. Blinov, M. E. Elyashberg, S. G. Molodtsov, A. J. Williams, E. R. Martirosian, Fresenius J. Anal. Chem., 369, 709 (2001).
[20] J. E. Peironcely, M. Rojas-Cherto, D. Fichera, T. Reijmers, L. Coulier, J.-L. Faulon, T. Hankemeier, J. Cheminf., 4, 21 (2012).
[21] H. Koshino, H. Satoh, T. Yamada, Y. Esumi, Tetrahedron Lett., 47, 4623 (2006).
[22] S. Takahashi, H. Satoh, Y. Hongo, H. Koshino, J. Org. Chem., 72, 4578 (2007).
[23] S. Takahashi, M. Yasuda, T. Nakamura, K. Hatano, K. Matsuoka, H. Koshino, J. Org. Chem., 79, 9373 (2014).
[24] H. Nagamochi, Algorithmica, 53, 207 (2009).

責任著者(Corresponding author)

訂正情報

J-STAGEへの登録はこちら（無料）