Journal of Computer Chemistry, Japan
Online ISSN : 1347-3824
Print ISSN : 1347-1767
ISSN-L : 1347-1767
Handling of Highly Symmetric Molecules for Chemical Structure Elucidation in a CAST/CNMR System
ジャーナル フリー HTML

2016 年 14 巻 6 号 p. 193-195


In chemical structure elucidation, symmetric structures sometimes need careful handling. In the course of the development of a system for NMR-based structure elucidation, we have developed a method for processing chemical structures with high symmetry by applying an efficient algorithm for graph inference. In the present letter, we describe the outline of the algorithm by taking bibenzyl as an example. The method has been implemented to CAST/CNMR, a system for NMR-based structure/chemical shift prediction.


We have been developing CAST/CNMR, a data-centric system for chemical structure analysis based on NMR data [1,2,3,4,5,6,7]. CAST/CNMR consists of two modules for prediction of 13C-NMR chemical shifts from a chemical structure (CShift-Predictor) and for structure elucidation from13C-NMRchemical shift values (CAST/CNMR Structure Elucidator). These classical topics in chemoinformatics have been intensively studied in the last couple of decades [8,9,10,11,12,13,14,15,16,17,18,19,20]. The CAST/CNMR system has been developed for 20 years. In the context of the analysis of chemical structures with complex stereochemistry, such as natural organic compounds, we have been motivated to develop a system that can handle complex structures sufficiently for practical use. Starting with a development of an original coding method for chemical structures, CAST [1,2,3], we have developed the CAST/CNMR system [1,2,3,4,5,6,7] with a 13C-NMR chemical shifts–chemical structure database described by using the CAST notation. The database consists of 5,413 organic compounds including mainly natural products and their synthetic intermediates. It contains 91,530 carbons. We have revised some reported chemical structures by applying CAST/CNMR [21,22,23]. The CAST notation is designed to give a canonical code for a chemical structure, which consists of notations of a topological connectivity, absolute/relative configuration, and conformational structures. The stereochemical information is described by using dihedral angles. In order to apply CAST/CNMR to more various molecular systems, we have developed a method for handling highly symmetric molecules in CAST/CNMR Structure Elucidator. In this present article, we will give a brief description about the new method as well as the application to bibenzyl for an instance.


The computing procedure for the structure elucidation in CAST/CNMR Structure Elucidator is what follows:

Step I. Search substructures whose assigned chemical shifts corresponds to a part of the observed 13C-NMR chemical shifts.

Step II. Construct a candidate structure by merging the substructures retrieved in step I on the basis of their maximal common parts.

For example, when three substructures in Figure 1 are retrieved in step I, the rightmost structure is constructed in step II [7].

Figure 1.

 Fragment-based structure elucidation procedure. The rightmost structure is elucidated by assembling the three substructures.

The elucidation, however, might fail when a correct structure consists of multiple same substructures or a recursive structure. For example, consider bibenzyl in Figure 2. Although the search result depends on the contents of the database, the substructures retrieved in step I can mostly contain a part of a phenyl group (A) and a structure containing an ethylene group (B). In this case, two substructures A must be assembled with B to form bibenzyl by identifying the same colored edges. However, since the procedure merges substructures at the maximal common part, the second substructure A turns out to completely overlap with the first one. Hence, structure C, consisting only of the blue lines of the bibenzyl structure, is eventually elucidated.

Figure 2.

 An unsuccessful case of elucidation. Due to the symmetric structure, one of the A structures cannot be used correctly.

In order to solve this problem, we have utilized an efficient algorithm for inferring a graph from path frequency proposed by Nagamochi [24] in the field of combinatorial optimization.


A graph inference problem is a question to infer a graph with vertex labels, from the number of vertices associated with each of the labels and the number of edges between vertices associated with each pair of labels. That means, given those numbers, called required numbers, one infers a graph consistent with the numbers. When applying the algorithm proposed in [24] to a chemical structure, it is treated as a chemical graph with vertices (atoms) and edges (bonds). The vertex label corresponds to an atom type or more technically the CAST code regarding the topological connectivity [1]. A bond type is not considered. For example, Figure 3 shows the required numbers of a chemical graph of phenol. If the required numbers are given, then the graph inference algorithm can be used to elucidate a structure.

Figure 3.

 The required numbers for a chemical graph of phenol.


To show the concept of the new method, we give a demonstration that the bibenzyl structure is elucidated by the method from C in Figure 2. The correct structure with chemical shift assignment is shown in Figure 4 (i). The input is a set of query 13C NMR chemical shift (C-shift) values (141.8 ( × 2), 128.5 ( × 4), 128.4 ( × 4), 125.9 ( × 2), and 38.0 ( × 2) in ppm) with a tolerance parameter α = 1.0 ppm, which is used for classifying chemical shift data. The elucidation is performed in the following procedure:

Figure 4.

 Structure elucidation of bibenzyl by using the graph inference algorithm. (i) The correct structure with observed 13C NMR chemical shift (C-shift) assignment in ppm. (ii) The elucidated structure with C-shift values assigned based on the data retrieved from the structure-spectrum database.

(1) Group the query chemical shifts according to α. In this case, we obtain the following groups Q1, Q2, Q3, and Q4.

Q1 = {141.8, 141.8},

Q2 = {128.5, 128.5, 128.5, 128.5, 128.4, 128.4, 128.4, 128.4},

Q3 = {125.9, 125.9}, Q4 = {38.0, 38.0}

(2) Similarly, group the chemical shift values of structure C (141.9 ( × 2), 129.1 ( × 4), 128.8 ( × 4), 126.0 ( × 1), and 37.4 ( × 2) in ppm), which are retrieved from the database, also according to α. However, in this step, if two carbon atoms are in different structural classes even though their C-shift values are in the same group, they are assigned to different groups. As a result, we obtain the following groups C1, C2, C3, C4, and C5.

C1 = {141.9, 141.9}, C2 = {129.1, 129.1, 129.1, 129.1},C3 = {128.8, 128.8, 128.8, 128.8}, C4 = {126.0}, C5 = {37.4, 37.4}

(3) Based on the similarity of chemical shift values, some groups obtained in (2) are united. In this case, C2 and C3 are merged to give group C6. To each group of those obtained in (2), assign the groups obtained in (1) according to α. As a result, we obtain the following pairs of groups.

(a) Q1 and C1, (b) Q2 and C6, (c) Q3 and C4, (d) Q4 and C5

(4) For each group obtained in (3), calculate the number of carbon atoms needed to satisfy the set of the query chemical shifts. Concretely, if Qi corresponds to Cj, the required number of each of the carbon atoms in Cj is defined to be |Qi|/|Cj|. Regarding pair c (Q3 and C4), the carbon atom with 126.0 ppm must be duplicated.

(5) According to the required numbers for atoms determined in (4), calculate the required numbers for edges. For example, the required numbers for the edges between carbon atoms with 141.9 and 129.1 ppm, respectively, is four. For the pair of 126.0 and 128.8 ppm, the required number is calculated to be four, since the carbon atoms with 126.0 ppm are duplicated in the previous step.

(6) Now we are ready to apply the graph inference algorithm. Using the required numbers calculated in (4) and (5), a structure with C-shift assignment shown in Figure 4 (ii) is elucidated.


We have developed a new method to handle molecules with high symmetry in CAST/CNMR Structure Elucidator by applying a graph inference algorithm. The concept was demonstrated using bibenzyl. The new elucidator will be applied also to more complex structures.

We would like to mention that we have been preparing to open the CAST/CNMR program package to the public. Until then, contact any of the present authors, if interested in trying the program.

S.K. was partly supported by Nanzan University Pache Research Subsidy I-A-2 for the 2014 academic year.

© 2016 Society of Computer Chemistry, Japan