Mass Spectrometry
Online ISSN : 2186-5116
Print ISSN : 2187-137X
ISSN-L : 2186-5116
Review
Technical Challenges in Mass Spectrometry-Based Metabolomics
Fumio Matsuda
著者情報
ジャーナル オープンアクセス HTML

2016 年 5 巻 2 号 p. S0052

詳細
Abstract

Metabolomics is a strategy for analysis, and quantification of the complete collection of metabolites present in biological samples. Metabolomics is an emerging area of scientific research because there are many application areas including clinical, agricultural, and medical researches for the biomarker discovery and the metabolic system analysis by employing widely targeted analysis of a few hundred preselected metabolites from 10–100 biological samples. Further improvement in technologies of mass spectrometry in terms of experimental design for larger scale analysis, computational methods for tandem mass spectrometry-based elucidation of metabolites, and specific instrumentation for advanced bioanalysis will enable more comprehensive metabolome analysis for exploring the hidden secrets of metabolism.

INTRODUCTION

Metabolism is one of the most essential processes in all living organisms. Metabolome, which denotes the composition of all the metabolites in an organism, has been considered the ultimate determinant of the phenotype of an organism, since metabolome is the result of interaction between genetic, epigenetic, and non-genetic factors including age, environment, disease, drugs, nutrition, and lifestyle. The general aim of mass spectrometry (MS)-based metabolomics is making a list of all the metabolites present in target samples and determination of their concentrations by using capillary electrophoresis (CE)/MS, gas chromatography (GC)/MS, and liquid chromatography (LC)/MS.1)

Metabolomics has many applications in clinical, agricultural, and medical researches. For instance, PubMed database (10 Aug 2016) returned 13,324 results when searched by the keyword “metabolomics” suggesting that more than approximately one thousands of papers on metabolomics have been published in last decades. An example of the applications of metabolomics is the comparative studies of normal and diseased states. Comparing the metabolic profiles of normal and diseased samples yields a list of candidate biomarker metabolites that can be used in disease diagnosis and uncovering the molecular mechanisms underlying the diseases.24) Moreover, it is expected that novel diagnostic and preventive methods would be found by cohort studies based on large metabolome dataset acquired over a long period of time.5,6) For this purpose, robustness is necessary in the process of data acquisition from more than thousands of biological samples, and comparing the concentrations of the various metabolites in those samples.

In addition, metabolomics has been employed for evaluating the safety and ensuring substantial equivalence of genetically modified (GM) crops to that of wild types.79) A list of metabolites whose levels were altered in the GM crops has been compiled using metabolome analysis and their structures were elucidated by high resolution and tandem mass spectrometry.10) Furthermore, there should be many hitherto undiscovered natural products with novel chemical structures that could be used as lead compounds for the development of novel pharmaceuticals.11) Metabolomics is a promising tool for exploring the diversity of secondary metabolites produced by various microbes and plants and for finding metabolites with novel structures.12)

As mentioned above, data acquisition from thousands of samples and precise quantification of all the metabolites in those samples are technically challenging. The technology of metabolome analysis is still in its infancy, enabling the metabolic profiling of only up to one hundred biological samples by relative quantification of 50–200 known and unknown metabolites detectable by LC/MS, GC/MS, and CE/MS.13) This article describes the technical challenges, achievements, and limitations in the current metabolome analysis using mass spectrometry, and discusses breakthrough innovations required to overcome the present day bottlenecks. There are many comprehensive review articles on the basic concepts, methodology, and examples of the application of metabolomics, which are cited in the subsequent sections of this article.11,1419)

TARGETED VS. NON-TARGETED METABOLOMICS

There are targeted and non-targeted approaches in the qualitative analysis of metabolite content.17) In the case of usual “targeted” analyses, after optimizing the analytical conditions and preparing calibration curves using standard compounds of preselected metabolites, raw data are obtained from the samples of interest, from which concentrations of target metabolites are determined.18) A limitation of the targeted approach is ignoring unexpected changes in the concentration of other metabolites. On the other hand, the samples of interest are first analyzed by a general analytical method in “non- or un-targeted” analysis. Then, a data matrix including signal intensity data of all observed metabolite peaks in the chromatograms is constructed using a specialized software. Recent development of a series of peak picking software packages enables the construction of a high-quality data matrix.20,21) Structure-related information of each metabolite signal is annotated by searching the database for retention time, m/z in mass spectra, and the fragmentation patterns in the tandem mass spectra data.22,23) An advantage of the non-targeted approach is that the data matrix theoretically includes quantitative information of all metabolite signals observed in the chromatograms. It has been estimated that approximately 100–1000 metabolites are recorded in a usual non-targeted metabolome data acquisition experiment.24,25)

BOTTLENECK IN THE METABOLOMICS 1: STRUCTURAL ELUCIDATION OF UNKNOWN METABOLITES

Success in the non-targeted approach depends on the enrichment of metabolite annotation by database searching. We do not have a complete list of human and other naturally occurring metabolites that is unavailable from genome information, which means that metabolite lists should be constructed for each target species and sample. The metabolite annotation by searching the database for retention time, m/z in mass spectra, and the fragmentation patterns in the tandem mass spectra data has been attained for metabolites whose measured reference data are available in public databases such as MassBank (http://www.massbank.jp/),26) HMDB (http://www.hmdb.ca/),27) and MZcloud (https://www.mzcloud.org/). It means that annotatable metabolites in metabolome data are within commercially available metabolites since most measured MS and MS/MS data in the public databases are obtained from the commercially available authentic standards of biological metabolites.23) The number of metabolites that can be annotated using these databases ranges from 50 to 200 depending on the samples. Therefore, reporting and creating databases of tandem mass spectrum data of rare biological metabolites is important for enriching metabolome annotation data as performed by HMDB and ReSpect (http://spectra.psc.riken.jp/).28)

Although MS/MS spectra-based identification of biomarker metabolites and plant secondary metabolites has been reported,2932) structural elucidation of unknown metabolites from MS and MS/MS data is still a difficult task owing to lack of diversity in the measured MS/MS data in the databases and poor methodology for searching the similarity.33,34) It should be noted that one of the most straightforward ways for certain identification of the metabolite structure is still the spectroscopic analysis of isolated metabolites.10,35,36)

NEW APPROACH IN THE METABOLOMICS: WIDELY TARGETED ANALYSIS

The metabolite profile data of unknown metabolites is of limited use, because structural annotation of metabolites is essential for investigating disease mechanism and biomarker significance. It implied that targeted analysis that includes maximum possible number of annotatable metabolites is one of the practical strategies for metabolome analysis in biological studies.37) Sawada et al. reported widely targeted metabolome analysis using liquid chromatograph-triple quadrupole-mass spectrometers (TQ-MS) by developing selected ion monitoring series of 378 metabolites.38) Similar approaches are employed for the comprehensive analysis of lipids using liquid chromatograph-TQ-MS (lipidomics)39) and analysis of primary metabolites using gas chromatograph-TQ-MS.40) Although the number of measurable metabolites depends on the samples of interests, it has been reported that approximately 100–200 metabolites were usually analyzed by the widely targeted analysis. The widely targeted approach is now a productive and realistic strategy in metabolomics because the analysis could be performed using the more popular TQ-MS and default peak picking software developed by manufacturers.18,41)

BOTTLENECK IN THE METABOLOMICS 2: QUANTIFICATION WITHOUT VALIDATION AND QUALITY CONTROL

In the case of quantitative analysis of metabolites using LC/MS, raw signal intensity or peak area values were converted into metabolite concentrations by using external or internal standards. This is because raw signal intensity depends on conditions of the mass spectrometer and the magnitude of ion suppression derived from the sample matrix. On the other hand, since preparation of calibration curves and finding suitable internal standards of all target metabolites are unrealistic, raw signal intensity has been used as an indicator of the metabolite level in non-targeted and widely-targeted metabolomics studies.24) Even though there are reports of employing simplified normalization methods such as addition of global internal standards, the metabolome analysis has been performed with poor quality control of quantification data and validation of the analytical method used.42) Using raw signal intensity data hampers the comparison of the newly acquired metabolome data with that obtained earlier. It has been reported that comparison of a targeted and a non-targeted metabolomics assay of plasma samples revealed that there were substantial inconsistencies between the results of the targeted and non-targeted results due to the drift in conditions of the mass spectrometer.43) Thus, suitable size for a single study is likely to be less than 100 samples since the conditions of mass spectrometer could be considered to remain constant during the data acquisition of 100 analysis runs over 2–3 days.

A meta-analysis of metabolite markers suggests that several blood amino acids appear to be consistently associated with the risk of developing type 2 diabetes.44) However, there are few meta-analysis studies that merge several distinct metabolomics datasets due to poor quality control of metabolome data. This is also because there has been no definite and versatile inlet method that is able to simultaneously separate a wide variety of metabolites. In the case of LC/MS metabolomics, there are at least 5 groups of LC conditions, which include reverse phase analysis using octadecylsilyl (ODS) column for relatively hydrophobic metabolites,45) normal phase analysis using columns for hydrophilic interaction chromatography (HILIC) of hydrophilic metabolites,4648) reverse phase analysis using pentafluorophenylpropyl stationary phase column for hydrophilic metabolites,49) ion-paring methods using ODS column for anionic metabolites,50) and lipidomics methods using ODS column for lipids. Recently, a comparison of the chromatographic performance of a traditional ODS column with various HILIC and mixed-mode columns showed that ODS and zwitterionic HILIC columns are the best combination for wide metabolite coverage.51) The poor quality control and incompatibilities among the data acquisition procedures seriously hamper meta-analysis of biomarkers using metabolome datasets.

KILLER APPLICATION OF METABOLOMICS: BIOMARKER DISCOVERY AND DISEASE METABOLOMICS

As above discussed, the current state of the metabolome analysis enables the metabolic profiling of hundreds of biological samples by relative quantification of 50–200 known and unknown metabolites detectable by LC/MS, GC/MS, and CE/MS. Despite the technical limitations, there are a large number of applications of metabolome analysis. For example, Soga et al. reported that ophthalmic acid was identified as a biomarker indicating hepatic glutathione consumption, using differential non-targeted metabolome analysis of approximately 20 samples by CE/MS.29) Yoshida et al. obtained widely targeted metabolome data including 87 metabolites by CE and GC/MS from approximately 100 samples, for systematic prediction of yeast life span.52) Nishiumi et al. reported a novel serum metabolomics-based diagnostic approach for colorectal cancer from the GC/MS-based widely targeted analysis of 132 metabolites from approximately 60 samples.53) Because of these pioneering works, metabolomics has been applied in medical research for confirming the existing hypotheses of disease mechanism as well as to formulate new hypotheses from the metabolome data that includes hundreds of annotatable metabolites obtained by the comparative analyses of 50–100 normal and disease samples. Indeed, many review articles have been published on the application and contribution of metabolomics to the research into diseases such as diabetes,54,55) cancer,56,57) cardiovascular disease,58) depression,59) and rheumatic diseases.60)

Biomarker discovery has been another application of metabolomics. The candidate biomarker metabolites associated with some diseases or the efficacy of a particular drug treatment could be determined from the comparative metabolome analysis of 50–100 samples.58,61,62) Metabolomics is becoming one of the promising methods for biomarker discovery as indicated by the fact that more than 25% of the publications in PubMed with keyword “metabolomics” describe biomarker discovery work.

Once the performance and applications of the metabolomics have been established, the development of infrastructure has been accelerated to meet the needs of the many users of metabolomics. The method packages for widely targeted metabolome analysis using LC-MS are commercially available, making it possible for the researchers to conduct metabolome analysis without having to first develop the analytical method (For example, LC-MS/MS method package for primary metabolites from Shimadzu, https://www.shimadzu.eu.com/lcmsms-method-package-primary-metabolites). A sophisticated data processing software has been developed based on the smart peak picking, normalization, missing value imputation, transformation, scaling, and multivariate analysis methods (such as Traverse MS by Reifycs, https://www.reifycs.com/).25,63,64) The visualization environment and interpretation tools such as VANTED (https://immersive-analytics.infotech.monash.edu/vanted/) and metabolite set enrichment analysis via Metaboanalyst Web page (http://www.metaboanalyst.ca/) are also available for metabolome data.6567) In addition to being an important tool of the biological analysis, metabolomics has the potential of becoming an essential part of the transomics analysis that integrates other omics such as genomics, transcriptomics, and proteomics.6870)

OVERCOMING BOTTLENECK 1: COMPUTATIONAL MASS SPECTROMETRY FOR PREDICTION OF FRAGMENTATIONS

Peptide identification in proteomics is based on the prediction of tandem mass spectra data from the amino acid sequence, using the relatively simple fragmentation rules of collision induced dissociation of peptides.7173) Although a rule to account for mass shifts in fragmentations of even-electron organic ions has been proposed,74) precise prediction of fragments produced by collision induced dissociation of target molecules continues to be challenging in computational mass spectrometry research.73,7578) Since the databases of naturally occurring metabolites have been developed and enriched by methods such as in silico derivatization of chemical structure,79,80) the prediction of tandem mass spectra data from the metabolite structure, based on some fragmentation rules,75) quantum chemistry simulations as well as machine learning techniques73) would be key technologies for metabolomics and computational mass spectrometry. For example, the competitive fragmentation modeling engine has been developed to produce a probabilistic generative model for the CID fragmentation process by machine learning techniques, which has been helpful in compound identification in GC-MS metabolomics.81,82) Moreover, a library of predicted MS/MS spectra was constructed by in silico fragmentation of possible human metabolites including the known 8,021 endogenous human metabolites in the Human Metabolome Database (HMDB) and their 375,809 predicted metabolic products via one metabolic reaction in the Evidence-based Metabolome Library (EML).83) In addition to the computational mass spectrometry, the construction of high-quality databases of MS and MS/MS spectra as well as deeper understanding of chemistry of fragmentation mechanisms are also essential for developing the in silico fragmentation tools.84)

OVERCOMING BOTTLENECK 2: QUALITY CONTROL OF QUANTIFICATION DATA

Methodologies for large-scale metabolomics studies by merging two or more metabolome datasets have been investigated by taking two approaches: preparation of stable isotope-labelled standard compounds and using quality control (QC) samples. Recent large-scale metabolome analyses employed an experimental design using QC samples to correct a drift of the raw signal intensity during the analysis.8587) The QC samples were prepared by mixing all the sample extracts in one analysis batch or in one metabolome analysis study. The iterative analyses of the QC sample were inserted into the start, end, and between every 4–8 actual samples in batch sequences of data acquisition. Since the QC sample theoretically includes all metabolites observed in the samples of this batch, a time course of signal intensity drift derived from the reduction in the sensitivity of the mass spectrometer could be monitored for each metabolite from the QC sample data. Biases in the signal intensities of metabolome data obtained from the actual samples were corrected based on the time courses.88) Although QC sample-based method is not able to correct the effects of ion suppression, it is expected that an integration of metabolome data would be attained by using an identical QC sample among different metabolomics studies and laboratories, as well as by normalizing signal intensity data relative to that of QC samples.89,90) It should be noted that the QC sample-based integration and normalization of metabolome data depends on a standardization of experimental design for data acquisition among different metabolomics studies and laboratories.91)

Another approach is the determination of absolute concentrations of metabolites, since the exact concentration data are comparable, irrespective of the condition of mass spectrometry and the differences in data acquisition methods. One method for absolute quantification is employing internal standards by preparing stable isotope-labelled standards of all target metabolites. Stable isotope-labelled standards and their mixtures are commercially available for some of the biological metabolites including amino acids, lipids, and sugars (http://www.isotope.com/). It is expected that the determination of the absolute concentrations of wider ranges of metabolites by preparing stable isotope-labelled standards of other metabolites, would aid metabolomics studies, but such an effort would be significant and time-consuming.

It has been attempted to use a mixture of 13C-labelled metabolites obtained from yeast cells cultured in a medium containing [U-13C] glucose, as an internal standard for the determination of intermediates of energy metabolism including sugar phosphates, organic acids, and cofactors.92) Although this method is useful for the metabolome analysis of microorganisms, there is a limitation to its application to human biofluids and tissues, since mammalian-specific metabolites such as creatine and carnitine are not present in yeast cells. The stable isotope-coded derivatization has also been attempted into LC-MS based metabolomics in order to expand range of detectable metabolites and to determine those levels by the chemical derivatization using differently isotopic labelled reagents that will be useful for the focused metabolome analysis of several metabolite groups.93)

The discussion indicates that an essential factor in a large-scale metabolomics project is the design of the experiment.91) The purpose of the project determines the precision level required for the measurement of metabolite concentration, according to which, suitable experiments are designed by employing an internal standard or QC sample-based methods. The experimental design and data acquisition for a large-scale metabolomics project requires large manpower, machine resources and support from the organization and funding. For example, the Phenome Centre project in UK (http://www.imperial.ac.uk/phenome-centre) conducted a large-scale metabolome analysis of human biofluids required for translational research by using the QC sample-based experimental design employing facilities of mass spectrometry used in London 2012 Olympic games.94,95) The development of next generation methodology of the metabolomics would be driven by the core research center of metabolomics.

RETHINKING INSTRUMENTATION OF MASS SPECTROMETRY FOR METABOLOMICS OR BIOANALYSIS

Further development of metabolome analysis requires an advancement of mass spectrometry with free from the constraints of current beliefs and paradigms of present generation of instrumentation. For example, many methods have been developed for the determination of mRNA levels in biological samples, based on separation by gel electrophoresis (GE) and detection by probe hybridization (northern analysis), PCR amplification of target sequence and GE, real-time PCR, microarray using hybridization to antisense probes, and RNaseq using the read count by next generation sequencer.96) Digital PCR is a new technique for direct counting of the copy number of mRNA in samples without using calibration curves.97) Transcriptomics research has been driven by the development of new technologies. On the other hand, versatile LC-MS instruments have been developed recently, mainly for the pharmacokinetics research, since high throughput analysis of basic drugs has been attained by the development of high-pressure binary gradient pumps with stainless steel tubing for reverse-phase chromatography using a column of silica gel with chemically bound ODS groups and electrospray ionization (ESI). However, the target metabolites in the analysis of energy metabolism, which is one of the most important areas in metabolomics and bioanalysis, are hydrophilic and anionic compounds that include sugar phosphates, organic acids, and cofactors such as glucose-6-phosphate, citric acid, and ATP. Although several techniques were employed for the metabolic profiling of the energy metabolism related intermediates using LC/MS,50,98) there are remaining problems such as broad peak shapes, insufficient separation among structural isomers, and severe ion suppressions from the sample matrix.99) These problems allow the opportunity to reconsider the LC/MS instrumentation for the bioanalysis purposes, since metabolome analysis of biological samples is one of the second largest application of mass spectrometry.

It has been shown that one reason for the broad peak shape is the adsorption of anionic compounds to various stainless parts in LC/MS machines.100) Continuous evaluation of ultra bioinert components such as metal-free instruments, column, and needle for ESI probe, is essential to overcome the problem of adsorption. New hybrid columns for the metabolome analysis of energy metabolism have been developed and commercialized by introducing anion exchange mode into ODS, which is an important step in finding new de facto standards for bioanalysis.99) It is also known that several metal ions such as Mg2+ remaining in the sample extracts interact strongly with the di- and trivalent anions resulting in a broader peak shape of citric acid and ATP.101) This problem could be partly avoided by introducing a trap column or an online solid phase extraction technique for sample injections. The robustness of ATP analysis could be improved by avoiding the contamination of the analytical column by metal ions.102) If a trap column is used for sample injection, sensitive analysis of energy metabolism related intermediates could be accomplished by introducing nano flow LC using narrow bore columns, as has been done in proteomics.103)

One of the problems intrinsic to metabolome analysis using LC/MS is ion suppression by ESI.104) In addition to improving the ionization efficiency of ESI, developing new complete ionization methods for bioanalysis such as the reconsideration of the historic interfaces such as the frit-FAB (fast atom bombardment) for LC/MS105) and the chemical ionization and derivatization techniques of GC-MS,19) remains to be investigated. As is the case with gene expression analysis, the final goal of the quantitative analysis is molecular counting in samples without using a calibration curve. For innovations to occur in bioanalysis, several breakthroughs are required not only in the separation methods but also in mass spectrometry.

CONCLUSION

Metabolomics now requires computational methods for tandem mass spectrometry-based elucidation of metabolites, experimental designs for larger scale analysis, and advanced instrumentation specific to bioanalysis by mass spectrometry. Advancement in the technology of mass spectrometry could directly contribute to metabolomics and other bioanalysis areas. Development of basic and fundamental technology is essential for the next breakthrough in mass spectrometry and metabolomics.

Acknowledgments

We thank Prof. Y. Izumi (Kyushu University) and Prof. Y. Sugiura (Keio University) for helpful comment to the manuscript. This manuscript was partially supported by JST, Strategic International Collaborative Research Program, SICORP for JP-US Metabolomics, and a Grant in Aid for Scientific Research (C) No. 15K06579.

REFERENCES
 
© 2016 Fumio Matsuda. This is an open access article distributed under the terms of Creative Commons Attribution License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
feedback
Top