The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.
The Critical Assessment of Small Molecule Identification (CASMI) contest was initiated in 2012 to evaluate manual and automated strategies for the identification of small molecules from raw mass spectrometric data. The authors participated in both category 1 (molecular formula determination) and category 2 (molecular structure determination) of the second annual CASMI contest (CASMI 2013) using slow but effective manual methods. The provided high resolution mass spectrometric data were interpreted manually using a combination of molecular formula calculators, fragment and neutral loss analysis, literature consultation, manual database searches, deductive logic, and experience. The authors submitted correct formulas as lead candidates for 16 of 16 challenges and submitted correct structure solutions as lead candidates for 14 of 16 challenges. One structure submission (Challenge 3) was very close but not exact (N2-acetylglutaminylisoleucinamide instead of the correct N2-acetylglutaminylleucinamide). A solution for one (Challenge 13) was not submitted due to an inability to reconcile the provided fragmentation pattern with any known structures with the provided molecular composition.
The combination of partitioning and systematic bond disconnection has been used to identify compounds from accurate-mass fragmentation data. This combination is very effective in excluding wrong answers that occur by chance. However, both processes are CPU intensive. This paper describes a novel data structure for representing molecules in a computer readable format that is conducive to very rapid mass spectral searching while still retaining the advantages of partitioning and systematic bond disconnection.
The second Critical Assessment of Small Molecule Identification (CASMI) contest took place in 2013. A joint team from the Swiss Federal Institute of Aquatic Science and Technology (Eawag) and Leibniz Institute of Plant Biochemistry (IPB) participated in CASMI 2013 with an automatic workflow-style entry. MOLGEN-MS/MS was used for Category 1, molecular formula calculation, restricted by the information given for each challenge. MetFrag and MetFusion were used for Category 2, structure identification, retrieving candidates from the compound databases KEGG, PubChem and ChemSpider and joining these lists pre-submission. The results from Category 1 were used to guide whether formula or exact mass searches were performed for Category 2. The Category 2 results were impressive considering the database size and automated regime used, although these could not compete with the manual approach of the contest winner. The Category 1 results were affected by large m/z and ppm values in the challenge data, where strategies beyond pure enumeration from other participants were more successful. However, the combination used for the CASMI 2013 entries was extremely useful for developing decision-making criteria for automatic, high throughput general unknown (non-target) identification and for future contests.
We present the results of a fully automated de novo approach for identification of molecular formulas in the CASMI 2013 contest. Only results for Category 1 (molecular formula identification) were submitted. Our approach combines isotope pattern analysis and fragmentation pattern analysis and is completely independent from any (spectral and structural) database. We correctly identified the molecular formula for ten out of twelve challenges, being the best automated method competing in this category.
CASMI (Critical Assessment of Small Molecule Identification) is a contest in which participants identify the molecular formula and chemical structure of challenging molecules using blind mass spectra as the challenge data. Seven research teams participated in CASMI2013. The winner of CASMI2013 was the team of Andrew Newsome and Dejan Nikolic, the University of Illinois at Chicago, IL, USA. The team identified 15 among 16 challenge molecules by manually interpreting the challenge data and by searching in-house and public mass spectral databases, and chemical substance and literature databases. MAGMa was selected as the best automated tool of CASMI2013. In some challenges, most of the automated tools successfully identified the challenge molecules, independent of the compound class and magnitude of the molecular mass. In these challenge data, all of the isotope peaks and the product ions essential for the identification were observed within the expected mass accuracy. In the other challenges, most of the automated tools failed, or identified solution candidates together with many false-positive candidates. We then analyzed these challenge data based on the quality of the mass spectra, the dissociation mechanisms, and the compound class and elemental composition of the challenge molecules.
The CASMI 2013 (Critical Assessment of Small Molecule Identification 2013, http://casmi-contest.org/) contest was held to systematically evaluate strategies used for mass spectrometry-based identification of small molecules. The results of the contest highlight that, because of the extensive efforts made towards the construction of databases and search tools, database-assisted small molecule identification can now automatically annotate some metabolite signals found in the metabolome data. In this commentary, the current state of metabolite annotation is compared with that of transcriptomics and proteomics. The comparison suggested that certain limitations in the metabolite annotation process need to be addressed, such as (i) the completeness of the database, (ii) the conversion between raw data and structure, (iii) the one-to-one correspondence between measured data and correct search results, and (iv) the false discovery rate in database search results.