Mass Spectrometry
Online ISSN : 2186-5116
Print ISSN : 2187-137X
ISSN-L : 2186-5116
Original Article
Method for the Compound Annotation of Conjugates in Nontargeted Metabolomics Using Accurate Mass Spectrometry, Multistage Product Ion Spectra and Compound Database Searching
Tairo OguraTakeshi BambaAkihiro TaiEiichiro Fukusaki
Author information
Supplementary material

2015 Volume 4 Issue 1 Pages A0036


Owing to biotransformation, xenobiotics are often found in conjugated form in biological samples such as urine and plasma. Liquid chromatography coupled with accurate mass spectrometry with multistage collision-induced dissociation provides spectral information concerning these metabolites in complex materials. Unfortunately, compound databases typically do not contain a sufficient number of records for such conjugates. We report here on the development of a novel protocol, referred to as ChemProphet, to annotate compounds, including conjugates, using compound databases such as PubChem and ChemSpider. The annotation of conjugates involves three steps: 1. Recognition of the type and number of conjugates in the sample; 2. Compound search and annotation of the deconjugated form; and 3. In silico evaluation of the candidate conjugate. ChemProphet assigns a spectrum to each candidate by automatically exploring the substructures corresponding to the observed product ion spectrum. When finished, it annotates the candidates assigning a rank for each candidate based on the calculated score that ranks its relative likelihood. We assessed our protocol by annotating a benchmark dataset by including the product ion spectra for 102 compounds, annotating the commercially available standard for quercetin 3-glucuronide, and by conducting a model experiment using urine from mice that had been administered a green tea extract. The results show that by using the ChemProphet approach, it is possible to annotate not only the deconjugated molecules but also the conjugated molecules using an automatic interpretation method based on deconjugation that involves multistage collision-induced dissociation and in silico calculated conjugation.


The conjugation reaction increases the hydrophilicity of a compound by attaching a hydrophilic molecule to it.1) An analysis of urine, one of the main destinations for excreted conjugates, reveals the presence of a wide range of conjugated compounds, originating not only from endogenous compounds but also from compounds derived from ingested food.2,3) In nontargeted metabolomics, a technique that is not limited to specific compounds, it is important to analyze these conjugates. Liquid chromatography coupled to mass spectrometry (LC/MS) is frequently used for the detection and structural analysis of conjugates because it has a higher sensitivity than NMR, despite the fact that it has a lower structural determination capability compared to NMR.4,5)

The product ion spectrum resulting from mass spectrometer data using collision-induced dissociation (CID) provides information regarding the substructure of a compound. However, manual data interpretation for spectrum annotation is a cumbersome task, requiring specific knowledge about mass spectrometry as well as the target compound. A computational approach can overcome these problems by virtue of its ability to provide an automatic interpretation.69) We previously reported on an automated annotation system for interpreting the nontargeted analysis of a multistage CID (MSn) spectrum, followed by the retrieval of candidates from a compound database.10) We were able to successfully annotate 20 components that showed contribution to tea quality, such as caffeine, catechins, and a series of organic acid esters. However, the identification of conjugates in nontargeted metabolomics remains a challenge. This is because the number of conjugated compounds in most databases is still limited. For example, a substructure search for compounds containing glucuronic acid in ChemSpider returned only 9,556 compounds, despite the fact that the database contains more than 32 million records (2014/9). Structural analyses of conjugates derived from known compound is used in studies of drug metabolism.4) Ridder et al. applied this approach to 75 known green tea components and generated 27,245 theoretical metabolites. As a result, they annotated 97 potential components, including 75 compounds that are not listed in PubChem.11) Because comprehensive information concerning both the ingested components and their metabolism is required, this technique cannot be easily applied to unknown components and metabolisms. Mass spectrometry following sample preparation through chemical or enzymatic deconjugation is also a known method for the structural analysis of and the identification of conjugates.12,13) Because the conjugated form cannot be directly identified using this method, a method that does not involve a deconjugation procedure is generally preferred.14)

In this study, we re-designed the compound annotation protocol and developed a method which we refer to as ChemProphet that can annotate compounds by taking into consideration conjugated forms using a compound database. ChemProphet uses an MSn spectrum to identify the type and number of conjugates, annotates the compounds in a deconjugated form based on a database search, and then evaluates the conjugated forms generated in silico.

We conducted compound annotation using a benchmark dataset in order to evaluate the performance of the ChemProphet protocol. We also analyzed chemical standards and confirmed the capability of our method to annotate the conjugates. Lastly, we annotated the conjugates present in the urine of mice that had been administered green tea. Green tea components15) and their metabolites have previously been identified using NMR16,17) and LC-MS12,1620) analysis of authentic standards. We attempted to annotate these conjugated metabolites in a model experiment.


Retrieval of candidates from the compound database

For a benchmark test, we retrieved candidates from PubChem by querying mono-isotopic mass, as shown in Table S1. We used ChemSpider for the other experiments. The candidates were retrieved by querying the predicted formulae. Duplicate structures and stereo isomers were eliminated.

Spectrum assignment, scoring, and ranking

ChemProphet searches for appropriate substructures whose molecular formula corresponds to the m/z value of the observed product ion. Rearrangements were not taken into consideration. ChemProphet calculates the final candidate score based on three different scores: the formula score, spectrum score, and penalty score. The penalty score is calculated based on the penalty value and the relative intensity Eq. (1).   

Where, single bond: p=1, double bond: p=2, triple bond: p=3, bond including non-carbon: h=1, carbon–carbon bond: h=2, keton bond: h=3.

The penalty value was calculated by Eq. (2).   

Where, i ∈ Product ion. PVi is minimum penalty value for each product ion. RIi is relative intensity (%) of each product ion. n is the number of product ion.

The rank of each candidate was determined by the final score. In the rank calculation, we used the “lowest rank,” determined as the sum of the number of candidates that showed a better score and the number of candidates that showed the same score. In this study, product ions with relative intensities of less than 5% were ignored.

In silico prediction of conjugates

Conjugated forms are generated for selected candidates using the annotation results of a MS3 analysis. Specific groups involved in the conjugation were considered. Hydroxyl and amino groups were considered as substrates for sulfation, while hydroxyl, carboxyl, amino, and thiol groups were considered as substrates for glucuronidation.

Materials and analytical conditions

Chemicals and reagents, sample preparation, instrumentation, and analytical conditions are described in Supplementary materials.


Structure assignment and ranking using a benchmark dataset

Although we previously used a commercial software program for predicting product ion, we developed ChemProphet equipped with newly designed automatic assignment and scoring procedure. The automatic assignment procedure was performed by predicting the formula of the observed product ion, a substructure search for matching to predicted formula, and an evaluation of neutral loss. As a result, the ratio of assigned product ion was improved, particularly in the case of a negative ion (data not shown). On the other hand, a holistic rise of assigned ion ratio resulted the need to an additional assessment in order to distinguish a preferable substructure. Therefore, we added a penalty score to evaluate the likelihood of bond dissociation. The penalty score was calculated based on the penalty value. The penalty value was proposed by Hill and Mortishire-Smith et al.,21) and a modified penalty value was also used by Ridder et al.9) A special penalty for a ketone bond was added for ChemProphet. We calculated the penalty score to normalize the difference in the maximum penalty value between the various candidates, and to consider the intensity of its product ion. The spectrum score was calculated in a manner similar to that for the structure score reported in a previous report.10)

Several computational annotation protocols have been reported.69) Hill et al. developed a protocol using Mass Frontier Ver. 4, and the evaluation involved the annotation of product ion spectra derived from 102 compounds.6) Wolf et al. developed the MetFrag protocol based on an annotation by comparing with computational fragmentation and scoring with the bond dissociation energy of the cleaved bond.8) Ridder et al. developed an annotation protocol referred to as MAGMa for MSn analysis using computational assignment and scoring based on a penalty value.9)

Our approach involved the use of the benchmark dataset reported by Hill et al. as well as Wolf et al. and Ridder et al. In order to compare the results, we generated a merged product ion spectrum and retrieved candidates using PubChem in a manner similar to a previous report.8) The results of the annotation are shown in Tables 1, S2–S5. Although we could not compare each result precisely, because the number of retrieved candidates was different, our protocol showed better results than previous reports in terms of average, median, and third quartile rank. The processing time of ChemProphet was more varied than the others (Figs. S1 and S2). The difference in time was caused by differences associated with the algorithm used for the assignment, because ChemProphet searched substructures from scratch for each formula.

Table 1. Statistical results of ranking the benchmark dataset in comparison to published studies.
ChemProphetHill et al.a)Wolf et al.a)Ridder et al.b)
Product ionMergedSelectedMergedMerged
Average rank14.7 (+/−3.6)44.2 (+/−14.1)24 (+/−7.9)30.8 (+/−9.8)
Std. deviation36.6142.580.298.9
Median rank344.53
3rd quartile rank817.511.759

a) Rank for each candidate were supplied as additional file published by Wolf et al.b) Rank for each candidate were supplied as supporting information published by Ridder et al.

Assignment the MSn spectra of conjugates using the quercetin 3-glucuronide standard

The annotation process for the conjugates involves three main phases: 1. Prediction of the deconjugated formula based on an assessment of the type and number of possible conjugate reactions; 2. Assignment and ranking of the deconjugated candidates retrieved from the compound database; and 3. Evaluation of the in silico generated conjugates based on the ranking of the deconjugated form, as shown in Fig. S3. The type of conjugation can be recognized by a neutral loss.5) We automatically acquired the product ion spectra for the deconjugated ion using a data-dependent MS3 analysis triggered by the neutral loss observed in a MS2 analysis. The number of conjugate reactions can be determined by the depth of the data-dependent MSn analysis.

We analyzed the quercetin 3-glucuronide as a standard compound and annotated its MSn spectra to evaluate the capability of our protocol to annotate conjugates. In the MS2 spectrum, the deprotonated quercetin molecule was observed as a product ion at m/z 301.0354. Subsequently, the MS3 spectrum of quercetin was acquired automatically. A list of predicted formulae including the correct formula, C21H18O13, was obtained. ChemSpider returned 59 candidates by querying the deconjugated formulae. After annotation of the MS3 analysis with the retrieved candidates, quercetin ranked 15th. After annotation of the MS2 analysis with the in silico generated glucuronide conjugates, quercetin 3-glucuronide ranked 10th and nine candidates except for CSID 9718758 showed the same score (Fig. 1). These candidates had a similar backbone structure and the position of conjugation was the same.

Fig. 1. The final ten candidates for quercetin 3-O-D-glucuronide. Each candidate is named by ChemSpiderID (CSID) and by the type of conjugation.

Analysis of conjugates of green tea components in mouse urine

In a model experiment, we analyzed the urine of mice that had been administered a green tea extract, and searched for urinary conjugates derived from the green tea components. We used the negative mode ESI because these conjugates have acidic moieties attached. Representative chromatograms are shown in Fig. S4. A total of 3,105 peaks were detected. An OPLS regression analysis was used to extract the components of the green tea and its metabolites (Fig. 2). An assessment of significant neutral loss and an S-plot showed that four peaks, m/z 287, 369, 399, and 481 corresponding to conjugates of green tea components.

Fig. 2. Possible metabolites of green tea in a urine extract shown by score plot (a) and S-plot (b) of an OPLS regression analysis. The m/z value for each conjugate is indicated in the S-plot.

Table 2 provides a summary of the annotated candidates for these four features. Figure 3 shows the result of the assignment for di-hydroxyphenyl-γ-valerolactone sulfate. Here, the structure of the 3′-O-sulfate is shown, but the two positions assigned for 3′-O-sulfate and 4′-O-sulfate were not differentiated. Di-hydroxyphenyl-γ-valerolactone is a known metabolite that is produced by the C-ring cleavage of catechin. Li et al. identified di-hydroxyphenyl-γ-valerolactone-O-sulfate as a metabolite of green tea in human urine using NMR16) and mass spectrometry.18) Catechin-O-sulfate, O-methyl-gallocatechin-O-sulfate, and gallocatechin-O-glucuronide were previously reported as urinary metabolites derived from catechin in green tea by LC-MS analysis by comparison with an authentic standard and by NMR.12,16,18) As a result of the annotation of the generated sulfate conjugate, the scores for 3′- and 4′-O-sulfate were relatively high (Figs. S5 and S6). Romanov-Michailidis et al. synthesized the conjugated sulfate and detected it in human biological fluids.22) The product ion spectra of the synthetic sulfate conjugate showed that m/z 231 corresponds to the 3′- or 4′-O-sulfate, and the product ion at m/z 247 corresponds to the 5- or 7-O-sulfate.23) Compared to previous results, the assignment results for O-methyl-gallocatechin also suggest that a sulfate conjugated to the B-ring is produced. On the other hand, catechin-O-sulfate and gallocatechin-O-glucuronide were not reported by Ridder et al.11) The result of retrieving the annotated conjugate from ChemSpider is also shown in Table 2. No candidates were found in database.

Table 2. Results of putative annotation for green tea metabolites.
m/z of conjugatem/z 287m/z 369m/z 399m/z 481
Formula (conjugate)C11H12O7SC15H14O9SC16H16O10SC21H22O13
Formula (deconjugate)C11H12O4C15H14O6C16H16O7C15H14O7
Compound name and structure (deconjugate)Di-hydroxyphenyl-γ-valerolactoneCatechinO-Methyl-gallocatechinGallocatechin
Rank of deconjugated compound1754
# of candidate1,015412514203
Existence of conjugated form in ChemSpiderNot listedTwo isomers other than candidateNot listedFour isomers including candidate
Predicted site of conjugation3′-O or 4′-O3′ -O or 4′ -O3′ -ONot specified
Fig. 3. Product ion mass spectra after an MS1–3 analysis of the peak at m/z 287 and the automatically assigned substructure of the di-hydroxyphenyl-γ-valerolactone sulfate conjugate. The precursor ion is indicated as Prec. The theoretically calculated m/z values are shown under each structure.


Although compound databases are being improved each day, they are still incomplete in terms of information regarding conjugates. Thus, we developed a new system referred to as ChemProphet for annotating conjugates that are not found in databases. It also includes a newly designed automatic assignment and scoring process. We assessed the developed annotation protocol using a benchmark dataset and compared our results to those from previous studies. It was also demonstrated that ChemProphet can be used to annotate compounds by the annotation of quercetin 3-glucuronide and urinary conjugates derived from green tea components. This protocol can be applied to annotating conjugates derived from components of not only food and beverages2) but also other complex materials such as waste water24) in biological samples. In this report, we focused on glucuronide and sulfate groups as important urinary conjugates. However, there are other reactions worth investigating, such as glutathione conjugation, glycine conjugation, methylation, and acetylation. The ChemProphet approach can be applied to these conjugates in the same way, because it is known that these conjugate reactions also generate a specific neutral loss.5)


The study represents a portion of the dissertation submitted by Tairo Ogura to Osaka University in partial fulfillment of the requirement for his Ph.D.

© 2015 The Mass Spectrometry Society of Japan