Mass Spectrometry
Online ISSN : 2186-5116
Print ISSN : 2187-137X
ISSN-L : 2186-5116
Review
A Brief Review of Bioinformatics Tools for Glycosylation Analysis by Mass Spectrometry
Pei-Lun TsaiSung-Fang Chen
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2017 Volume 6 Issue 2 Pages S0064

Details
Abstract

The purpose of this review is to provide updated information regarding bioinformatic software for the use in the characterization of glycosylated structures since 2013. A comprehensive review by Woodin et al. Analyst 138: 2793–2803, 2013 (ref. 1) described two main approaches that are introduced for starting researchers in this area; analysis of released glycans and the identification of glycopeptide in enzymatic digests, respectively. Complementary to that report, this review focuses on mass spectrometry related bioinformatics tools for the characterization of N-linked and O-linked glycopeptides. Specifically, it also provides information regarding automated tools that can be used for glycan profiling using mass spectrometry.

INTRODUCTION

Overview of glycoprotein

Glycosylation, a specific enzymatic process in which glycans are attached to proteins or lipids, and is an important biological process that plays a role in cell signaling, cell adhesion, and the regulation of biochemical pathways. Of all post-translational modifications (PTMs), glycosylation is one of the most commonly observed type.2) It is believed that more than 50% proteins are glycosylated. The biological functions of glycoproteins are involved many types of biological processes. Therefore, automated tools for the identification of glycoproteins and the glycans that are attached to them, become fundamentally important. For analysis, tandem mass spectrometry (MS/MS) is a popular and efficient method in glycoproteomics because of its high sensitivity and selectivity.

Glycosylation heterogeneity

Glycoproteins generally exist as populations of glycosylated variants (glycoforms) of a single polypeptide. Although the same glycosylation machinery is available for all proteins that enter the secretory pathway in a given cell, most glycoproteins emerge with characteristic glycosylation patterns and the glycans at each glycosylation site are heterogeneous. The recognition of identical motifs in different glycans allows a heterogeneous population of glycoforms to participate in specific biological interactions. This is the most challenging factor for glycan analysis. Two major types of protein glycosylation are known: N-Linked glycans that contain asparagine-X-serine/threonine sequons (N-X-S/T) where X is any amino acid except proline. O-Linked glycans attached to the hydroxyl oxygen of either serine, threonine, tyrosine, hydroxylysine, or hydroxyproline side-chains, or oxygen atoms on lipids such as ceramide represent the second type of modification.

Glycoprotein analysis strategies for mass spectrometry

Mass spectrometry (MS) has been successfully used to determine glycan composition and their structures.35) Figure 1 shows MS strategies that are currently in use for glycoprotein analysis. These approaches can generally be divided into top-down and bottom-up strategies. The determination of the molecular weight of a glycoprotein represents a typical top-down analysis, which provides the most direct method for obtaining information on glycans in a glycoprotein. By calculating the molecular weight differences between the peaks, it is possible to determine the types of glycan modifications on that protein. Such an analysis, however, frequently lacks sensitivity and structure information. Because of this, glycoprotein analyses are often divided into two main strategies for collecting glycoprotein information using MS techniques. One involves an analysis of released glycans, while the other involves characterizing glycopeptides that are obtained by proteolytic digestion of the original glycoprotein.

Fig. 1. Strategies for glycoprotein analysis using mass spectrometry.

Previous reviews have dealt with software tools that permit glycosylation to be accomplished based on MS data alone, as well as software designed to speed up the interpretation of glycan and glycopeptide fragmentation from MS2 data.1) Compared to previous reviews, this review not only describes new bioinformatics tools for glycan analysis, but also focuses on software applications for glycopeptide analysis and glycan profiling.

BIOINFORMATIC TOOLS FOR THE ANALYSIS OF RELEASED GLYCANS

The analysis of glycans released from a glycoprotein by enzymatic digestion is the most common method for characterizing glycoproteins. Although less site-specific when compared with glycopeptide analysis, some types of bioinformatics software that can be used in conjunction with this method are listed below.

MS approaches for released glycan analysis

Goldberg et al. described a software program called “Cartoonist” for profiling N-glycans based on MALDI TOF mass spectra. Cartoonist has some interesting features. It uses a library of biosynthetically plausible cartoons that contain about 2800 cartoons derived from 300 archetypes. In addition, it computes a confidence score for each assignment a glycan peak according match mass.6) “GlycoWorkbench” was developed by the EUROCarbDB initiative.7) In this system, MS data can be interpreted by using GlycoWorkbench annotations. According to a theoretical list of fragment masses compared with a list of peaks derived from the spectrum, GlycoWorkbench provides glycan structural constituents based on a collection of fragmentation types.8) “SysBioWare” is a web-based interface that allows raw MS data to be processed. The unique function of SysBioWare is peak isotopic grouping, charge deconvolution, and continuous wavelet analysis. Candidate structures are generated based on a database search by this algorithm.9) Yu et al. presented a software called “MultiGlycan” that packages an automated annotation of a user-defined list of of glycans based on MALDI-TOF or LC-ESI-MS data. MultiGlycan functions by matching theoretical isotopic envelopes of glycans and accounts for overlapping glycan isotope distributions. Especially for ESI, isotopic envelopes can be simultaneously calculated by using specified adducts or multiple adducts to mass data. Both label-free and labeled glycans can be process by MultiGlycan for quantifying glycans.10,11)

MS/MS approaches for released glycan analysis

“STAT,” which was designed by Gaucher et al., represents the first web-based computational program for the determination of glycan composition using MS/MS spectra. STAT has the ability to rapidly analyze sequence information from a set of MS/MS spectra for up to ten monosaccharide residues. Another function of STAT is that possible structures are listed and given a ranking system based on correct sequence when more than one candidate glycan matches the data.12) The “StrOligo” program can successfully analyze tandem mass spectra of complex N-linked oligosaccharides. StrOligo finds the most intense peak in a tandem mass spectrum and compares the potential experimental isotopic distribution to one or two theoretical isotopic distributions. Some of the optimized parameters can be used by this algorithm to remove isotopic peaks and retain monoisotopic peaks; the overall intensities of both isotopic distributions, the position of the first isotopic distribution, and the m/z separation between both distributions.13) “GlycoFragment” enables the easy generation of all theoretically possible A-, B-, C-, X-, Y- and Z-fragments of a defined glycan structure according to the definitions of Domon and Costello.14) The algorithm uses the Sweet-II15,16) program to interpret nomenclature and chooses suitable templates from a database according to linkage information. “GlycoSearchMS” imports a mass spectrum to a database search of theoretically calculated spectra and identifies the best candidate spectra. It includes most of the theoretically possible spectra of N- and O-glycans, which were extracted from SweetDB.17,18)

The Oligosaccharide Subtree Constraint Algorithm (“OSCAR”) restructures analyst-selected fragments into branching or linkage glycan structures and provides a de novo algorithm for identifying a glycan structure without being limited by presumed biosynthetic structures. In particular, OSCAR can interpret MSn data for permethylated O-linked oligosaccharides, but the disadvantage is that it cannot process LC-MS data smoothly.19,20) Goldberg et al. also described an algorithm called “CartoonistTwo” that generates all possible cartoons and ranks them by score, similar to Cartoonist, but with a more sophisticated scoring function.6) This feature of CartoonistTwo permits fragmentation ions of O-linked glycans to be automatically annotated.21) “Glyco-Peakfinder” is useful for the de novo determination of glycan compositions. Knowledge concerning for this software computation is not based on a biologcal background or fragmentation information. The fragment ions of monosaccharide cross-ring cleavage products or multiply charged ions can be annotated using Glyco-Peakfinder.22) “SimGlycan” is a commercially available software program that uses MS/MS raw data files obtained from many different types of mass spectrometers. Based on exact mass spectra, SimGlycan has a built-in database system for use in glycan database searches and special scoring techniques that provides the most likely glycan structures.23) “GlycanID” is a software program that can be used for the analysis of LC-MS/MS data for profiling and identifying glycans. A glycan profile is generated with feature detection and alignment tools developed for proteomics. The features of GlycanID is its ability to distinguish the complexity fragmentation ions caused by salt adducted ions or contaminants, multiple charge states, and possible in-source decay.24)

In conclusion, “Glyco-Peakfinder” can be easily used for released glycan analysis on both sections of MS and MS/MS approaches (http://www.glycoworkbench.org/).

AUTOMATED ANALYSIS OF GLYCOPEPTIDES

Glycopeptide-based analysis is used to inform researchers the nature of glycans on one or more proteins.25) The method’s key attractive feature is its ability to link information regarding glycosylation to exact locations (glycosylation sites) on proteins. Some commonly used tools for glycopeptide identification, which are divided into MS and MS/MS analysis of N-linked or O-linked glycans are listed below.

Glycopeptide MS Data

“GlycoMod” is a software program that can help researchers find all possible compositions of a glycan structure from their experimentally data. The program can be used to annotate the composition not only underivatised reducing ends, but derivatised by methylated or acetylated monosaccharides as well. The algorithm will match the experimentally determined masses against in silico predicted enzymatic digested peptides obtained using SWISS-PROT or TrEMBL databases which have the potential to be glycosylated with either N- or O-linked glycans.26) “GlycoX” computes, not only the assignment of site-specific glycosylation without any glycan information, but also provides information on glycan heterogeneity. The GlycoX program has three main functions, 1. Isotope Filter. 2. Oligosaccharide Calculator. 3. Determination of Glycosylation Sites. It supports the interpretation of MS data obtained for fragments from a glycoprotein produced by nonspecific protease treatment.27) Desaire and colleagues presented a web-based tool, “GlycoPep DB.” The principle is similar to GlycoMod and is designed to find all possible compositions for glycopeptides by comparing experimentally measured masses to all calculated glycopeptide masses from a carbohydrate database for N-linked glycans. In comparison to GlycoMod, GlycoPep DB has the ability to process data for multiply charged ions and for making glycopeptide compositional assignment more efficient from the concept of “smart searching.”28) “GlycoSpectrumScan” was developed by Deshpande et al. It computes the masses of all possible glycopeptides, similar to GlycoMod and GlycoPep DB. Notably, the relative abundance based on signal intensities in a mass spectrum can also be calculated.29)

Here, the “GlycoPep DB” provides friendly user experience and it is available for free (http://hexose.chem.ku.edu/sugar.php).

Glycopeptide MS/MS Data

“GlycoPep ID” is a freely accessible web-based program that was specifically developed to identify peptide portions of glycopeptides. The glycopeptides may be generated by proteolytic cleavage with either a specific or a nonspecific enzyme. When the glycosylation sites are determined, the program generates a table consisting of all possible peptide sequences around these glycosylation sites. The most notable feature is that negatively charged glycopeptides can also be identified by GlycoPep ID.30) “GlycopeptideID” is a web tool developed to identify intact glycopeptides. The emphasis is on resolving a complicated peptide and a glycan that is unknown. Peptides are identified by matching the MS/MS spectra against a protein database and the glycans against a glycan database created from the GlycomeDB, and pre-loaded to the GlycopeptideID web server. The features of GlycopeptideID include a de novo glycan search, assigning the peptide and glycan modification, and uses probability based scoring for both peptide and glycan.31,32) “GlycoMiner” was developed by Ozohanics et al. It has the ability to automatically identify tandem mass spectra which correspond to N-glycopeptides by evaluating low mass oxonium ions, deduces oligosaccharide losses from the protonated molecule, and identifies the mass of the peptide residue. The disadvantage of GlycoMiner is that the glycan compositions assignments when the quality the spectra input is low.33) “Protein Prospector” combines CID and ETD search results into a single output file for glycopeptide identification by a database search. In addition, the software permits a manual comparison of the potential site assignments and the annotation of the glycosylated and de-glycosylated fragment ions in the same spectrum, which is a particularly useful feature for CID spectra of O-linked glycopeptides, where both fragment types are generally present.34)

Mayampurath et al. modified the “GlypID” algorithm,35) and developed a new software tool that implements several algorithmic approaches for utilizing MS information including accurate precursor mass and spectral patterns from both HCD and CID spectra, thus allowing for an unequivocal and accurate characterization of N-linked glycosylation sites of proteins.36) “GlycoPeptide Search (GPS)” incorporates a tool that simplifies the interpretation of N-glycopeptide CID MS/MS data and conjunction with the GlycomeDB database.37) The output results must have, not only glycopeptide oxonium ions, but also glycopeptide fragment ions that are consistent with the intact mass of the peptides.38) The “Glycolyzer” contains a full data analysis pipeline in one software package to allow for minimal user intervention. The general modules of this algorithm include data importing and exporting, FT-ICR signal preprocessing, internal calibration, noise threshold calculation, peak selection, isotope grouping and filtering, glycan annotation, intensity normalization, missing value filling, multiple spectra averaging, hypothesis testing, and multiple testing corrections. It can be applied to identifying glycan biomarkers.39)

“GlycoPep Grader (GPG)” is a free software tool designed to analyze MS/MS data obtained for N-linked glycopeptides. The scoring approach relies on the identification of unique dissociation patterns shown for high mannose, hybrid, and complex N-linked glycoprotein types, including patterns that are specific to those structures that contain fucose or sialic acid residues. The useful function of this tool is the scoring algorithm that was specifically designed for dealing with low resolution CID data.40) “Byonic” is a commercially available software package that can be used for identifying peptides and proteins by tandem mass spectrometry. Byonic provides some features: Top-down and bottom-up search, Modification Fine Control™, Wildcard Search™, glycopeptide search, sequence variant search and modification site localization. One of the items, glycopeptide search, provides three ways to specify glycopeptide searches: internal tables, external tables, and fine control modification.41) “GlycoMaster DB” incorporates an N-linked glycan database that was extracted from GlycomeDB. Using the GlycoMaster DB algorithm involves three steps. The first is filtration of glycopeptide spectra, the second is glycan assignment, and the third is peptide identification. the HCD/ETD spectrum can be used for glycopeptide analysis using GlycoMaster DB that permits glycopeptides and deglycosylated glycopeptides to be analyzed simultaneously to obtain glycans and peptide sequences. It also provides a GSM score scheme, similar to the PSM score in proteomics.42)

“GPQuest” is an algorithm used for the site-specific identification of intact glycopeptides using higher-energy collisional dissociation (HCD). The application of the GPQuest algorithm requires matching the spectra of HCD-fragmented glycopeptides with the experimental spectral library (ESL). The advantage of GPQuest is that tandem mass spectra of intact glycopeptides containing oxonium ions and some of the fragment ions generated through HCD fragmentation from their deglycosylated counterparts can be analyzed.43) “MAGIC” is a mass spectrometry-based automated glycopeptide identification platform that permits peptide sequences and glycan compositions to be identified directly by means of a conventional database sequence search. MAGIC uses the Y1 (peptideY0+GlcNAc) ion for filtering out unknown glycoproteins with a novel algorithm called Trident that detects a triplet pattern from the fragmentation of the common trimannosyl core of N-linked glycopeptide. MAGIC also provides fast computing power for large-scale complex proteome data sets.44) “GlycoSeq” uses a heuristic iterated glycan sequencing algorithm that incorporates prior knowledge of the N-linked glycan synthetic pathway to achieve rapid glycan sequencing. The limiting factor for GlycoSeq is glycan and glycopeptides is that peptides containing more than one potential glycosylation site cannot be distinguished due to the lack of signature fragment ions to identify peptide backbone information.45) “pGlyco” is a novel pipeline for the identification of intact glycopeptides by integrating HCD-MS/MS, CID-MS/MS and MS3 information. Both HCD-MS/MS and CID-MS/MS could be used to optimize glycopeptide identification with a false discovery rate. Data-dependent acquisition of MS3 for the most intense peaks of HCD-MS/MS was used to provide fragments to identify the peptide backbones.46)

In this “Glycopeptide MS/MS Data” section, “GlycoPeptide Search (GPS)” can be downloaded from https://edwardslab.bmcb.georgetown.edu/trac/GlycoPeptideSearch/. It is also a free software for glycopeptide tandem mass spectrum database search.

CONCLUSION AND PERSPECTIVES

In the recent years, mass spectrometry has become a powerful tool for the identification and structural characterization of glycans. With the development of the mass spectrometer, the trend for this research has led to two main strategies for use in studies of protein glycosylation: Glycan analysis and glycopeptide analysis. Compared with glycan analysis, glycopeptide identification by tandem mass spectrometry provides for the site-specific elucidation and collection of glycan information concerning a sequence derived from an enzyme digested glycopeptide. Bioinformatic tools for glycopeptide analysis is rapidly advancing and promises to continue to expand, since it provides for the smooth and relatively rapid analysis of glycosylation end products.

REFERENCES
 
© 2017 Pei-Lun Tsai and Sung-Fang Chen. This is an open access article distributed under the terms of Creative Commons Attribution License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
feedback
Top