Identification of Anthropogenic Compounds in Urban Environments and Evaluation of Automated Methods for Reading Fragmentation—A Case of River Water

A workflow based on liquid chromatography/high-resolution mass spectrometry (LC/HR-MS) was applied for the identification of compounds in urban environments. Substances extracted by solid-phase extraction from river water were wholly analyzed by LC/HR-MS without any purification. Fragmentation in collision-induced dissociation was manually studied for the 20 most intense ions in positive- and negative-ion electrospray ionization with accurate mass determination at a resolution of 100,000. Sixteen anthropogenic compounds in the extract were identified and confirmed using standard reference reagents. These compounds consisted of pharmaceuticals, surfactants, flame retardants, and industrial intermediates. The majority of the compounds are common in our daily life. In the identification process, two automated methods, MAGMa and MetFrag/MetFusion, for reading fragmentation were evaluated for the sixteen compounds. Although automated methods could be used to retrieve the correct molecular structures in most cases, they could not always be promoted to the top rank. Automated methods have yet to be a complete solution for identifying chemical compounds, but will considerably reduce the burden for humans in reading fragmentation.


INTRODUCTION
Mass spectrometry (MS) is a key technology for the identi cation and structural elucidation of small molecules. e molecular formulas of small molecules can be determined from accurate mass values measured by high-resolution (HR) MS because relative atomic masses other than carbon are not strictly integer quantities. Furthermore, there is a close relationship between molecular structure and reactivity of ions in MS, so information on molecular structure can be obtained by interpretation of their fragment ions in the mass spectra. HRMS can make interpretation of mass spectra fairly certain due to its high accuracy and precision. Although a mass spectral library of reference standards is quite helpful in the process of interpreting mass spectra, existing libraries are insu cient for reading all observed data. Of course, attempts to enrich the libraries are continuing, but the situation is not improving rapidly. In addition, ionization and fragmentation are sometimes inherent to instrumental conditions particularly in liquid chromatography/mass spectrometry, which is o en applied to polar compounds. erefore, manual interpretation of the resulting data continues to be time consuming and cumbersome.
During the last ve years, several promising automated methods for interpreting accurate mass spectral data have appeared on the scene. One of the methods is the utilization of public molecular structure databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes), PubChem, and ChemSpider, which are more comprehensive than are mass spectral libraries. Insu ciency in mass spectral libraries is considered serious even in the eld of environmental research because the number of unregulated emerging micropollutants has increased over the last few decades. e environmental risk caused by unregulated pollutants is of great concern, and environmental researchers are always nding new targets. Many were studied and some of them have been introduced into regulation. However, there is no method by which environmental researchers can evaluate all emerging pollutants and their transformation products.
Regulation has been implemented in order of priority level, but potential threats to ecosystems and human health by overlooked compounds remain.
During the last decade, new methodology coupled with HR-MS, referred as to suspect screening and nontarget analysis, has been introduced in environmental analysis. [1][2][3][4][5] Hug et al. 6) and Schymanski et al. 7) found novel micropollutants and transformation products in their nontarget strategy. In this study, we used two automated methods, MAGMa 8) and MetFrag/MetFusion, 9,10) to identify compounds in an urban aquatic environment and evaluate the two methods using the results. e methods are available on the Internet and they performed well in a public contest, CASMI2013 (Critical Assessment of Small Molecule Identi cation). 11)

Sample preparation
In February 2015, 500 mL of river water was collected from Neyagawa River, Osaka City, Japan, and passed through a solid phase extraction cartridge, Presep-C Agri Short (Divinylbenzene polymethacrylate copolymer sorbent, 200 mg, Wako Pure Chemical Industries, Ltd.) without pH adjustment. A er drying at reduced pressure by a vacuum manifold assembly, substances that had absorbed on the cartridge were eluted by 5 mL of acetonitrile. e eluate was analyzed by LC/MS without further cleanup. A blank experiment using ultrapure water, which was generated by TORAYPURE LV-10T (Toray, Tokyo, Japan), was also carried out.

Instrumental
For nontarget analyses, an LC/MS system consisting of an Ultimate 3000 (Dionex, Sunnyvale, CA, USA) and an Exactive Orbitrap Mass Spectrometer ( ermo Fisher Scienti c Inc., Waltham, MA, USA) was used. Both positive-and negative-ion electrospray ionization (ESI) were applied. Mass spectra were acquired from m/z 100 to 2,000 with a resolution of 100,000 and a mass accuracy of 5 ppm at m/z 200. Mass calibration was carried out with a standard calibration mix including Ultramark 1621, sodium dodecyl sulfate, ca eine, and peptide MRFA ( ermo). For observation of fragment ions, all ion fragmentation (AIF) mode was applied at 25 eV. In the AIF mode, multiple dissociation techniques such as in-source collision-induced dissociation and higher-energy collisional dissociation cell were employed. Data were acquired both under MS mode and under AIF mode. Determination of elemental composition from accurate mass measurement was carried out with ermo Fisher Xcalibur So ware. e LC column was an ODS-100S column (2.0 mm×150 mm, 5 µm; Tosoh Corp., Tokyo, Japan). A gradient condition consisting of 2 mM ammonium bicarbonate in ultrapure water (A) and methanol (B) was adopted. e gradient, expressed as changes in mobile phase B, was as follows: 0-2 min, hold at 30% B, 2-15 min, a linear increase from 30% to 100% B; 15-20 min, hold at 100% B; 20-25 min, equilibration at 30% B. Another ODS-100S column was installed before the autosampler to delay elution of substances from the LC system. Con rmation of substances with reference reagents was conducted using a triple quadrupole mass spectrometer. A system consisting of Agilent 1100 (Agilent Technologies, Palo Alto, CA, USA) and an API 2000 (Sciex, Foster City, CA, USA) was used. e condition of selected reaction monitoring in ESI was optimized for each substance.

Approaches to explain the mass spectra examined in this study
MAGMa Ridder et al. introduced an automated method for the interpretation of accurate mass spectra in 2012. 8) e method is based on an algorithm for candidate substructure annotation of multistage accurate mass spectral trees without relying on a spectral library. First, candidate structures are retrieved from a compound database by querying on monoisotopic mass. Next, fragmentation of structural skeletons (i.e., non-hydrogen atoms) is performed by removing each non-hydrogen atom sequentially and collecting substructures. During the fragmentation, a simple penalty score that depends on the type of bond is taken into account. en in silico generated substructures are assigned to accurate m/z values in a mass spectrum. Referred molecular structures are retrieved from chemical compound databases such as PubChem, Human Metabolite Database (HMDB), and KEGG. An online version, MAGMa, has a user-friendly web interface, http://www.emetabolomics.org/magma/. Use of this method was selected as the best automated tool of the international contest, CASMI2013, which tested the ability to explain blind mass spectral data. 11) In the present study, PubChem was selected as the compound database.

MetFrag/MetFusion
MetFrag was developed in 2010 by Wolf et al. MetFrag also simulates fragmentation to explain ions in a mass spectrum. 9) Before bond disconnection of candidate structures, a small set of rules for molecular rearrangements are applied. en all bonds to be disconnected are labeled linear or ring. A redundancy check is performed at every disconnection process.
e in silico fragments are matched against the query peak list. Referred molecular structures are retrieved from a chemical compound database such as PubChem, KEGG, or ChemSpider. A web application is available at the following URL: http://msbi.ipb-halle.de/MetFrag/. MetFusion is the successor of MetFrag and was released by Gerlich et al. in 2013. 10) MetFusion incorporates the similarity using a mass spectral library in MetFrag scoring. Its spectral library refers to MassBank, HMDB, NIST'11, and METLIN. As with MAGMa, PubChem was selected as the compound database.

Preparation of queries for automated methods
Initially, obvious noise and substances detected in the blank experiment were excluded from the examination for compound identi cation. e intense ions in the mass spectra were extracted from each LC/MS analysis according to intensity. In ESI, without advance information, we usually cannot know whether these ions indicate which feature. e 20 most intense features were extracted. A er the assignment of precursor ions, analyses were carried out in AIF mode. Ions with the same chromatographic peak shapes as precursor ions were extracted to nd fragment ions, just as the aforementioned assignment of a feature. Ions with great di erences in the rst decimal place of the m/z values from that of precursor ions were excluded. A list of accurate masses of precursor and fragment ions was consolidated to obtain a virtual precursor ion and product ion relationship for each feature. is process is quite laborious but necessary because our Exactive is not a multistage mass spectrometer. e list was converted to appropriate formats depending on the requirements of each automated method. All m/z and peak area values used for evaluating automated methods are summarized in the electronic supplementary material (Table S7).  Fig. 1. Table 2 is a list of m/z value, peak area, and its isotopologue composition of sus-pected fragment ions of P1. e isotopologue composition indicated the absence of elements that had a speci city at X+2 such as chlorine, bromine, and sulfur, and the presence of roughly 15 carbon atoms. In addition, the presence of C 7 H 7 + (P1AIF1 in Table 2) suggested a benzyl or tolyl substructure. e candidate formula of fragment ion P1AIF7 corresponded to the neutral loss of fragment ions P1AIF3 or P1AIF4 in Table 2. e values of ring and double bond equivalents (RDB) of P1AIF3, 4, and 7 were fairly consistent with the inclusion of a tolyl or benzyl substructure. erefore a highly symmetric structure was suspected. Although the m/z value, 240.1497, has several candidate formulas such as C 13 H 23 NOP, C 15 H 18 N 3 , and C 17 H 20 O, only C 15 H 18 N 3 is consistent with the elemental compositions of the fragment ions of P1 because the precursor ion must include nitrogen atoms. Given that feature P1 is composed of two identical substructures, the consistent substructures were considered to be aminobenzyl, anilinomethyl, benzylamino, or toluidino groups. Each substructure contributes 4 of RDB. Hence, the residual structure must consist of one atom each  of carbon, hydrogen, nitrogen, and one double bond, and connect two substructures at the carbon atom. Features with substructures of the benzylamino and toluidino groups can be considered having a guanidine structure. Figure 2 shows the probable structures for P1, which correspond to the toluidino (A), benzylamino (B), anilinomethyl (C), and aminobenzyl (D) substructures. e cleavage positions of the bonds to generate the fragment ions listed in Table 2 are also shown in Fig. 2. As shown in Fig. 2, substance A could explain the generation of most fragment ions by simple scission. Consequently, substance A, tolylguanidine, was most likely but the methyl position on the phenyl ring could not be determined from interpretation of the fragmentation.

RESULTS AND DISCUSSION
Queries for MAGMa and MetFrag were prepared using ions having a relative intensity of over 1% of the base peak according to their format and submitted. Where the submitted fragment ions begin and end is di cult to de ne because one is sometimes redundant and another is sometimes essential. We never know the judgment criteria. First, intensity criteria were determined to be 1%. e number of retrieved candidates within 3 ppm accuracy tolerance by MAGMa and MetFrag were 2,708 and 2,738, respectively. e top rank consisted of not just one compound but of ve compounds for MAGMa and three compounds for MetFrag. MAGMa has a second parameter called refscore besides the fragment ion is based on a candidate score. Refscore is estimated from the number of PubChem records for one compound. e top rank of refscore was 1,3-di-o-tolylguanidine, which was one of the 14 compounds having the next best score. e identi cation was con rmed by LC/MS/MS with an authenticated reagent purchased from Sigma-Aldrich (St. Louis, MO, USA). erefore, 1,3-di-o-tolylguanidine was determined to be the correct answer. In the case of MetFrag, the answer ranked 100th with ve incorrect compounds that had the same score as the answer. MetFusion could not improve the result. en, queries with all detected fragment ions were submitted. e MAGMa answer ranked 16th with seven incorrect compounds; the MetFrag answer ranked 17th with ve incorrect compounds. In so far as rank goes, the rank of MAGMa worsened and the rank of MetFrag improved by the submission of all detected fragment ions. In the case of P1, P1 had a high number of candidates and many candidates concentrated with close scores. erefore, subtle di erences in scoring resulted in large di erences in rank.
Around the peak of 1,3-di-o-tolylguanidine, another peak was observed with an m/z di erence of 28.0312 corresponding to C 2 H 4 and the second highest intensity. e m/z value was 212.1185. e ion closely resembled 1,3-di-o-tolylguanidine in its fragmentation pattern. erefore, diphenylguani-    Table 2 also shown in the gure. e numbers of positions correspond to those in the rightmost row of Table 2. dine could easily be inferred. It was purchased from Tokyo Chemical Industry (Tokyo, Japan) to con rm the identi cation. A er submitting queries of diphenylguanidine to both MAGMa and MetFrag, the correct answer ranked 3rd in 1,442 candidates and 88th in 1,499 candidates, respectively. In a similar way, fragmentation data were sequentially interpreted to determine compounds. All identi cation processes were completed by using authentic standard reagents. Identi ed compounds and their results with MAGMa and MetFrag are listed in Table 3. Detailed interpretation of each feature is provided in the electronic supplementary material. Identi cation could be completed for more than half of the features. Automated methods were able to retrieve the correct answers but not always promote them to be top rank.
In the case of negative ion mode, the top four features had a constant interval of m/z 14.015 corresponding to CH 2 . eir chromatograms are shown in Fig. 3 and indicate the presence of many isomers. Furthermore, in their fragmentation, these substances resembled a principal fragment ion, m/z 183.0110. Neutral losses of the precursor ions corresponded to the values of C n H 2n+2 . is indicates the presence of an identical substructure and an alkyl chain with a different chain length and di erent substitution position. e m/z values of the common fragment ions were 183.0110 and 119.0491. e isotopologue composition of these fragment ions was di erent in X+2. e mass defect of these fragment ions indicated a loss of an element with a large negative mass defect such as sulfur.
e accurate mass di erence 63.9619 between the two fragment ions corresponded exactly to sulfur dioxide. e elemental composition candidates of the fragment ion 119.0491 included C 6 H 5 N 3 and C 8 H 7 O. C 8 H 7 O was more likely because the RDB of C 6 H 5 N 3 was Table 3. Identi ed compounds in positive ion mode, number of ions in queries, and results by two automated methods. * e denominators, numerators, and parentheses, respectively, denote the number of retrieved substances, the order of the correct answer, and the number of substances with the same score as the correct answer.
integral and it was not consistent with its anionic form. Its RDB was 5.5 and suggested the presence of a phenyl ring. It was possible that the precursors had amphipathic structures such as sulfonate-type surfactants. e loss of C n H 2n+2 is typical fragmentation in anionic surfactants. 12) In Japan, linear alkyl benzenesulfonate (LAS) is a leading detergent with an annual manufactured and imported quantity of 48,160 t in FY 2013. 13) Its monoisotopic masses of anionic forms are 297.1530 (4-decan-x-ylbenzenesulfonate, x=2-5, C 10 LAS), 311.1686 (4-undecan-x-ylbenzenesulfonate, x=2-6, C 11 LAS), 325.1843 (4-dodecan-x-ylbenzenesulfonate, x=2-6, C 12 LAS), and 339.1999 (4-tridecan-x-ylbenzenesulfonate, x=2-7, C 13 LAS). ese values coincided with the values observed (N1-N4 in Table 1). e monoisotopic masses of their dimers also coincided (N11 and N12). e product ion, m/z 183 (in nominal mass), is well known in typical LAS analyses by LC/MS/MS. 14) Eventually, the identi cation of LAS was con rmed by using reference standards (Wako Pure Chemical Industries, Ltd., Osaka, Japan). ese m/z values  * e denominators, numerators, and parentheses, respectively, denote the number of retrieved substances, the order of the correct answer, and the number of substances with the same score as the correct answer. ** e query was submitted under the condition of 5 ppm tolerance and halogen inclusion.
were submitted to both MetFrag and MAGMa. MAGMa and MetFrag retrieved 219 and 446 candidates for C 10 LAS, respectively. e correct answer ranked 1st in MAGMa and 4th in MetFrag, respectively. However, numerous candidates ranked at the same position as the correct answer. Although the retrieved candidates included LAS, neither tool could explain these fragment ions. In the fragmentation of sulfonate-type surfactants, a product ion, SO 3 − , is conceivable. e present collision energy, however, was not enough to generate SO 3 − , which would improve the results. Identi ed compounds and their results of MAGMa and MetFrag are listed in Table 4. Detailed interpretation of each compound is provided in electronic supplementary material.
Schymanski et al. has proposed a level system to facilitate the communication of compound identi cation con dence in the eld of environmental research. 15) According to the level system, the 16 present compounds reached Level 1 con dence (Con rmed structure). ese compounds consisted of pharmaceutical compounds, surfactants, a ame retardant, and other industrial materials. Osaka City has a population of about 2.6 million and is situated at the mouth of the Yodo River on Osaka Bay. ere are many streams and watercourses, which o en become stagnant or even ow backwards. Hence, the aquatic environment is considerably in uenced by human activity. 16,17) e compounds found in the present study also re ected this situation. Crotamiton, telmisartan, and clarithromycin are common pharmaceutical compounds that are o en targeted in the context of environmental research. 18,19) In Japan, more than 6 million patients su er from high blood pressure. 20) e unequivocal detection of telmisartan appears quite conceivable. How-ever, several compounds such as bicalutamide and 1,3-di-otolylguanidine have rarely been targeted. 1,3-Di-o-tolylguanidine is known not only as a vulcanization accelerator in rubber production but also as a ligand of the σ receptor in the central nervous system 21) and designated in Monitoring Chemical Substances found in Japan's Act on the Evaluation of Chemical Substances and Regulation of eir Manufacture, etc. Its annual manufactured and imported quantity in Japan was over 100 t from FY 2007 to FY 2009. 22) Subsequent environmental research on these compounds is expected.
In Schymanski's level system, several features remained at Level 5 (Exact mass). Several of them had a relationship like a constant m/z di erence. Figure 4 shows an example of two homologous series ions with an m/z di erence of 28.031 corresponding to CH 2 . Features P4, P8, P15, and P19 were included in these series. Although the m/z di erences of 14.015 were found between the series, it seemed the two series had di erent structures because the retention time was not consecutive. ese features were remarkable but it is unusual to keep a record of such features in scienti c communication because their exact identi cation could not be ascertained. Even though their formulas could not be assigned, we believe that reciprocation of such data under assignment Level 5 is still meaningful for a better understanding of chemical substances in the environment.

CONCLUSION
In this study, nontarget analysis was conducted to identify anthropogenic compounds in an urban aquatic environment and two automated methods were evaluated using the results. Both MAGMa and MetFrag retrieved correct answers in most queries. ey also usually retrieved many incorrect answers with the same score as the correct answer. Increasing the number of submitted fragment ions has not always improved results. Both tools appeared to be equal in their results in the present sample, but MAGMa o ered a great advantage in terms of processing time.
To explain the fragment ions in a query, MAGMa and MetFrag use disconnecting bonds of numerous compounds in molecular structure databases. is procedure is impossible to conduct manually, and considerably reduces the burden on humans in reading fragmentation. Of course, that reason is not enough to recommend the procedure and it should be complemented from broader perspectives such as chemical reaction probability, isotope pattern, consistency between precursor structure and product structure, and speci c neutral loss on substituent group or element, as described in the electronic supplementary material. In addition to the tools used here, many automated tools that include di erent perspectives are available. To complement legacy target analysis for environmental risk evaluation of chemical compounds, new methodology is anticipated. Such future development would also be a breakthrough in the eld of environmental research.