Proceedings of the Japan Academy, Series B
Online ISSN : 1349-2896
Print ISSN : 0386-2208
ISSN-L : 0386-2208
Reviews
The complexity of glycoprotein-derived glycans
Johannes F. G. VLIEGENTHART
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2017 Volume 93 Issue 2 Pages 64-86

Details
Abstract

A brief review is presented of our studies on the structure of glycoprotein-derived glycans. The emphasis is on the introduction of high-resolution 1H-NMR spectroscopy for the unambiguous determination of primary structures. For this purpose, we developed the structural reporter group concept. Structural reporters are defined as unique markers of structural elements in the NMR spectra. Application of this concept led to the discovery of numerous new structures. Furthermore, a number of structures presented in the literature could be corrected. The results are relevant for insight in the various steps in glycan metabolism in health and disease, for the function and mode of action of glycans in vivo and for the interpretation of structural information obtained through other techniques. The strength of the approach is further shown for several highly complex glycoproteins, carrying very heterogeneous and complicated glycans.

Introduction

Glycosylation is one the most prominent post/co-translational modifications of proteins. The attachment of covalently linked glycans to a protein is a non-template-driven process, carried out by interplay of sequentially acting specific glycosyltransferases and glycosidases. The latter enzymes play a main role in trimming of intermediary structures in the biosynthesis. The glycans can be attached to the protein at one or more glycosylation sites via N, O, C or S-atoms in the amino acid side chains. In the main N-glycoproteins the linkage occurs via the amide side chain of asparagine that forms part of the consensus sequence AsnXxxSer/Thr, wherein Xxx can be any amino acid except Pro. O-glycans are linked via the side chain of hydroxy amino acids.1)3) Up to now, C-linked mannose is the only observed example of a C-linked glycan. Mannose is exclusively attached to the indole nucleus of tryptophan.4),5) S-glycosylation is rare, and only observed in glycopeptides.6) Due to its nature, the biosynthesis of glycoproteins can give rise to the formation of a plethora of compounds that have in common identical protein chains, but differ in structure of the glycan chains and/or in occupancy of the glycosylation sites. This feature constitutes the so-called (micro)-heterogeneity of glycoproteins. By consequence a glycoprotein comprises usually more than one molecular structure. The heterogeneity complicates single-molecule studies and the assignment of a biological or physical function to a specific glycan structure. A further complication in assigning a specific function to a single glycan structure is due to the occurrence in various types of glycoproteins. These biomacromolecules may be excreted, function as membrane constituents or may be part of the extracellular matrix. Therefore, glycans having the same primary structure might differ in function due to their different presentation to the environment. Defining an in vivo function at the molecular level may require knowledge of the molecular context/architecture, wherein the glycan exerts its function.

For gaining fundamental insight in the structure-function relation of glycans of a glycoprotein and of the glycoprotein as a molecular entity, structural information is needed. A first step is the determination of the primary structure. This comprises essential structural elements like the description of a number of parameters including i) the structure of the protein chain, ii) the structure of glycan chains, iii) the location of the glycosylation site(s) in the protein and iv) the degree of occupation of the glycosylation sites. Finally, since the molecular environment at the place of action has influence on the function, definition of the molecular embedding is a relevant aspect; this holds in particular for membrane-bound glycoproteins.

This brief review presents a selection of the contributions of our research group to the study of the structure of glycans of glycoproteins to illustrate the large variety in glycan-structure and in possible function. The attention is focused mainly on results we obtained by our introduction of high-resolution 1H-NMR spectroscopy for structural analysis.7) This method opened a new approach in structural studies.

Primary structure of glycans.

An important issue concerns the isolation of a ‘pure’ glycoprotein as starting material. This includes also the proper selection of well-defined cells and/or tissues. In recent years, the development of advanced chromatographic methods gave great improvements, although the aforementioned (micro)heterogeneity leads almost by definition to a family of compounds. With the increased sensitivity and resolution power of up-to-date analytical techniques, it is still of crucial importance to distinguish properly, natural heterogeneity from heterogeneity arising from contaminating glycoproteins. Glycan analyses can rarely be performed on the intact glycoprotein. Therefore, the glycoprotein is degraded to partial structures by means of enzymatic or chemical reactions. Then the fragments, which may be oligosaccharides, oligosaccharide-alditols or glycopeptides, are isolated, purified and the structure determined. Subsequently, the glycoprotein structure can be deduced if sufficient fragments have been analysed.

For a long time structure analysis of the glycan moieties made only a rather slow progress, due to limitations of the used analytical methods. The breakthrough in this area resulted from the development of high-resolution instrumentation and analytical techniques like nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS). Sophistication of the methodology was a prerequisite to enable the structure determination of complex glycans, since a large number of parameters have to be established to identify the structure:

  • *    Nature and number of the constituting monosaccharides.
  • *    Ring size and absolute configuration of the monosaccharides.
  • *    Sequence of the monosaccharides.
  • *    Type and anomeric configuration of the glycosidic linkages.
  • *    Type of the carbohydrate-amino acid linkage and position of the corresponding amino acid in the protein.
  • *    Type and position of non-carbohydrate substituents.
  • *    Occupation of the glycosylation sites.

We focused on the development of an integral methodology leading to unambiguous answers for the structure of glycans. To achieve this goal, in general information from more than one independent method is needed. We have chosen for NMR and if feasible MS, in combination with chemical techniques. We established that methanolysis of glycoproteins, followed by monosaccharide analysis e.g., by GLC of suitable derivatives is adequate for establishing the identity of the constituting monosaccharides.8) A new method was designed for determining the absolute configuration of monosaccharides (D or L) based on GLC of diastereomers. These are obtained through derivatization with (−)-2-butanol.9),10) This step is particularly relevant for non-mammalian systems, since even both enantiomers may occur in one glycan. This is illustrated in Fig. 1 for a O-glycopeptide isolated from the venom of the piscivorous cone snail Conus consors, wherein indeed both D and L galactose residues occur.11)

Fig. 1.

Structure of glycopeptide CeTx, obtained from Conus comsors. Note that the glycan contains D- as well as L-Gal residues.

For determination of the position of glycosidic linkages, we applied methylation analysis.12) Analysis of the occupation of glycosylation sites is performed by analysis of (glyco)peptides.

Mass spectrometry.

The advances of mass spectrometry are impressive. In particular, the progress made in the mass range that can be analysed, the developments in ionization techniques and importantly, the advancement in glyco-bio-informatics rendered analysis of carbohydrates by mass spectrometry feasible. This could be shown by us for the analysis of mono- and oligosaccharides as pertrimethylsilyl, or permethyl derivatives with electron impact ionisation. A significant number of structural details can be elucidated.13) The introduction of Fast Atom Bombardment (FAB) ionisation enlarged further the mass range of compounds that can be analysed. We could show this for FAB mass spectrometry in the positive as well as negative mode for underivatized oligosaccharides and glycopeptides obtained from glycoproteins.14) In the past years, main steps forward as to instrumentation and ionization methods are realized, enabling the study of intact glycoproteins. For example, the molecular multiplicity of glycoproteins has recently been illustrated for intact chicken ovalbumin. High-resolution native electrospray ionization mass spectrometry on a modified Exactive Orbitrap mass analyser demonstrated qualitatively and semi-quantitatively 59 proteoforms in the natural protein, induced by 45 different glycan structures and a number of phosphorylation sites.15)

In a joint program with R. Schauer et al., we applied MS in conjunction with 1H-NMR successfully for identification of the structure of sialic acid residues.16),17) Sialic acids proved to form a large family of compounds that arises from variation in substituents at different positions in the basic structure.18) This is shown in Fig. 2.

Fig. 2.

Structure of neuraminic acid. Possible substituents are: R4: OH or OAc. R5: NH2, NAc, NGl or OH. R7: OH or OAc. R8: OH, OAc, OCH3, SO4 or Sia. R9: OH, OAc, O-lactyl, PO4, SO4 or Sia.

One dimensional (1D) 1H-NMR spectra.

For elucidation of glycan structures, we introduced high-resolution (360–900 MHz) 1H-NMR spectroscopy. NMR spectroscopy is non-destructive, thereby leaving all possibilities open for additional analyses. The NMR spectra disclose the greater part of the many structural details and allow mostly the deduction of unambiguous structures. The starting set of 1H-NMR data, in terms of chemical shifts and coupling constants could be collected and verified thanks to the availability of so many partial structures of glycoprotein-glycans isolated from the urine of patients with inborn errors of glycan-metabolism like oligomannosidosis, sialidosis, fucosidosis, Gaucher’s and Sandhoff’s diseases. To record 1H-NMR spectra we dissolve glycan samples in 2H2O to exchange labile 1H for 2H. Subsequently, spectra of glycans are recorded in 2H2O solution. For the interpretation of these spectra we created the ‘structural reporter group concept’.19)24) The 1D 1H-NMR spectra contain a composite, so-called bulk signal, from about δ 3.2–3.9 ppm, mainly representing ring protons with similar chemical shifts that cannot be assigned easily to individual atoms. For unravelling the bulk signal and other complex signals, techniques like homo- and heteronuclear 2D NMR spectroscopy are applied. Outside of the bulk region, well-defined resonances can be distinguished, which are characteristic of structural elements. These signals represent the structural reporter group signals that can be used as structural identifiers. The main structural reporter groups comprise:

  • *    Anomeric protons.
  • *    Protons shifted out of the bulk-region due to glycosylation shifts or due to the influence of substituents, such as sulfate, phosphate, alkyl- and acyl-groups.
  • *    Deoxy sugar protons.
  • *    Protons of alkyl and acyl substituents like methyl, acetyl, glycolyl etc.

Examples of 1D 1H NMR spectra are shown in Fig. 3a,b.

Fig. 3.

(a) 1-D 1H-NMR spectrum of the Man9 GlcNAc2 recorded in 2H2O. The structural reporter groups of the individual Man residues are assigned. (b) 1-D 1H-NMR spectrum of the disialylated diantennary N-glycan, recorded in 2H2O. The relevant reporter group signals like NH-, anomeric-. Man-H2, N-acetyl, and Neu5Ac H3 signals are indicated. Note the long distance effect of the reducing GlcNAc-2.

General features of N-glycoproteins and the corresponding 1H-NMR data.

In the biosynthesis of eukaryotic N-glycoproteins the Glc3Man9GlcNAc2 oligosaccharide is exclusively transferred en bloc to Asn in a consensus sequence of the protein chain. The N-linked glycans are subsequently stepwise modified by the action of glycosidases and glycosyltransferases. Chemically the N-glycans are generally characterised by a common pentasaccharide core, consisting of an N,N-diacetylchitobiose unit extended with a branched tri-mannosyl entity as shown in Fig. 4a. Characteristic for this structural element are the reporter group chemical shifts of Man H1, H2 and the GlcNAc NAc groups as summarized in refs. 1924. The terminal mannoses can be elongated with exclusively mannose residues giving rise to oligomannose type as shown in Fig. 4b. The biosynthesis can further give rise to structures that can be conceived as extensions of the terminal Man residues of the tri-mannosyl core with N-acetyllactosamine moieties, arising from the sequential interplay of glycosidases and glycosyl transferases. This process may afford di-antennary compounds, further branching can give rise to tri-, tri′- and tetra-antennary structures, as illustrated in Fig. 4c. The branching pattern and the antennae can be identified on the basis of their reporters namely the Man H-1 and H-2 signals as summarized in Table 5 of ref. 23. For the lactosamine units, the anomeric signals of Gal and GlcNAc and the NAc resonances of GlcNAc, together with glycosylation shifts are characteristic as discussed in ref. 23. These glycans are designated as complex or N-acetyllactosamine type. Extension of the tri-mannosyl core with N-acetyllactosamine moieties at the Man-3 arm and with Man residues at the Man-6 arm yields the so-called hybrid type of chain, as shown in Fig. 4d. The spectra comprise a combination of the NMR signals of the corresponding antennae as presented in Table 6 in ref. 23.

Fig. 4.

(a) Common pentasaccharide core of N-glycans, containing a (β1-4) linked Man residue, which acts as branching point. (b) Oligomannose type of N-glycan including numbering of Man residues in the Man9 structure. (c) N-acetyllactosamine type of N-glycan, with sialylated di-, tri-, tri′ and tetra-antennae as extensions of the core. (d) Hybrid type of N-glycans. The Man(α1-6) branch is extended with the olimannose type and the Man(α1-3) arm with the N-acetyllactoasamine type of structure. (e) A branched trimannosyl entity extended with a Xyl residue (β1-2) linked to βMan.

Extensions of Asn-linked GlcNAc may be L-Fuc in (α1-6) and/or (α1-3) linkage. Fuc may also occur at peripheral parts of the glycan in (α1-2), (α1-3), and/or (α1-4) linkage. As summarised in Table 8 of ref. 23 the data of the structural reporter group signals for Fuc H1 and CH3 are characteristic for the various linkages.

Bisecting GlcNAc concerns a GlcNAc residue attached in (β1-4) linkage to the branching Man 3 of the core. Introduction of this residue shows the own signals and the effects on the structural reporter groups Man H-1, H-2 and NAc as presented in Table 6 of ref. 23. Another example is the occurrence of Xyl residues as shown in Fig. 4e. Sialic acids can occur as terminal residues at antennae. The type of glycosidic linkage (α2-3), (α2-6) or (α2-8) can be deduced from the chemical shifts of H3eq and H3ax, as shown in Table 8 of ref. 23. Substituents of the glycan like alkyl and acyl groups are identified by the corresponding 1H NMR chemical shifts. Inorganic substituents like sulfate and phosphate induce typical downfield shifts of neighbouring protons.

Oligomannose glycans.

As result of the biosynthesis the Man9GlcNAc2 can be conceived as a mature glycan for the eukaryotic system. Interestingly, for soybean agglutinin it was described that two types of N-glycans would occur each consisting of Man and GlcNAc in a molar ratio of 9 : 2, but differing in branching pattern.25) However, 1H NMR studies showed that only the well-known Man9GlcNAc2 sequence as presented in Fig. 4b is occurring.26) In many glycoproteins from plant and animal origin, also partial structures of the ‘mature’ Man9 chain occur. In Fig. 4b numbering of the Man residues has been given. In legume storage proteins e.g., in kidney bean glycoprotein II the structures ranged from Man9GlcNAc2 to Man6GlcNAc2 and in 75-soybean glycoprotein from Man8GlcNAc2, to Man6GlcNAc2.27) In Man8GlcNAc2, D3 is absent as shown in Fig. 5, in the other compounds partially also D2 and D3 are missing. In bovine lactotransferrin is besides the Man9GlcNAc2 structure also Man8GlcNAc2 found, missing D1.28) In lysosomal α-mannosidase from porcine kidney Man9 is present and Man6, lacking D1, D2 and D3, and also two isomers of Man5 lacking D1, D2, D3 and A as well as a compound missing D1, D2, D3 and C.29)

Fig. 5.

Man8 structure missing Man-D3.

Also in urine of some patients with inborn errors of metabolism such partial structures occur. For example, in urine of Gaucher’s disease the presence has been demonstrated of Man5, lacking the Man residues D1, D2, D3 and C; Man4 lacking the Man residues D1, D2, D3, C and 4, and Man2 lacking the Man residues D1, D2, D3, C, 4, A and B.30) In all cases the reporter group signals allow the complete assignment of structures. The partial structures arise from the catabolic route and are helpful in reconstructing the pathway of the stepwise degradation. It should be noted that also partial structures are present with Fuc(α1-6) linked to GlcNAc-1, probably derived from lactosamine-type glycans.

The precursor oligosaccharide in N-glycan biosynthesis.

From the precursor oligosaccharide Glc3Man9GlcNAc2 (see Fig. 6), which is transferred in the biosynthesis to the growing protein chain, also partial structures can be found. Analysis of the glycans of glycoproteins isolated from the ovary of the starfish Asteria rubens (L.) shows the presence of the Man9GlcNAc2, and Man8GlcNAc2 structure. The latter missing specifically Man-D3 as is evidenced by the absence of the corresponding structural reporter group signals. Furthermore, gluco-oligomannose compounds could be demonstrated, consisting of 9 Man residues and 1-3 Glc residues.31) The monogluco-oligomannose structure only carrying Glca, is characterized by the structural reporter groups of the Man9 moiety in conjunction with the anomeric proton of Glc. Comparison of the NMR data with those of the Man9 structure shows that only the H-1 and H-2 signals of Man D1 are affected by the presence of Glc indicating that Glc is attached to Man D1. Furthermore, di- and tri-gluco-oligomannose structures occur, characterized by the anomeric signals of the Glc residues. The assignment of these signals was confirmed by comparison of the synthetic oligosaccharide Glc(α1-2)Glc(α1-3)Glc(α1-3)Man(α1-2)Man.32) This identification of a unique collection of glycans provided insight in the metabolic route starting with Glc3Man9GlcNAc2. A further example of such a structure was obtained from the lipid-linked precursor oligosaccharide isolated from porcine thyroid tissue. The oligosaccharide was released from dolichol pyrophosphate by mild acid hydrolysis, studied by 1H-NMR and the structure compared to that deduced for other organisms.31) The reducing N,N-diacetylchitobiose unit and nine mannose residues are present, the latter in equimolar amounts.33) The chemical shifts are very similar to those we derived for Man9GlcNAc2Asn. The three additional anomeric signals are assigned to Glc residues. The values correspond very well with those of the synthetic oligosaccharide Glc(α1-2)Glc(α1-3)Glc(α1-3)Man(α1-2)Man(α1-).32) In addition to the NMR evidence, the branch location of the triglucosyl unit could be elegantly demonstrated after treatment with jack bean α-mannosidase. From the removal of the Man A, D2, B, and D3 residues and the identification of the resulting Glc3Man5GlcNAc(β1-4)GlcNAcα/β it is evident that the triglucosyl unit is attached to the Man-(D1-C-4) branch. This precursor oligosaccharide is a universal compound in the eukaryotic kingdom. In the regular biosynthesis, the precursor trigluco-oligomannose is stepwise degraded after transfer to the protein. This process starts with the removal of the Glc residues by glucosidase-I of Glc(α1-2) and then of the two Glc(α1-3) residues by glucosidase-II. Interestingly, in case of a defect in N-glycosylation, due to glucosidase I deficiency, we found in the urine the oligosaccharide Glc(α1-2)Glc(α1-3)Glc(α1-3)Man(α/β). This glycan arises from the alternative pathway by the action of endo-α-1,2-mannosidase as shown in Fig. 7.34) The NMR data of fragments of Glc3Man9GlcNAc2 are compiled in Table 2 of ref. 23.

Fig. 6.

Glc3Man9GlcNAc2 structure, the consecutive points of degradation by glucosidase I and II, respectively, are indicated by arrows.

Fig. 7.

Formation of the tetrasaccharide Glc3Man as a result of cleavage by endomannosidase of the Glc3Man9GlcNAc2 glycan. Point of cleavage is indicated.

Di- to tetra-antennary complex type N-glycans in α1-acid glycoprotein.

The structure elucidation of the asialo-glycans of α1-acid glycoprotein was among the first glycoprotein projects we started. In this study, NMR spectroscopy resulted in a breakthrough by affording unambiguous structures. In combination with methylation analysis, the glycans were identified as di-, tri-, and tetra-antennary lactosamine type compounds.35)39) Novel was also the observation that tetra-antennary glycans occur with Fuc(α1-3) linked to an external GlcNAc. The major constituent contains Fuc(α1-3) attached to GlcNAc 7 of the tetra-antenna, yielding a Lex epitope. In addition, two minor compounds are present in the tetra-antennary fraction having Fuc(α1-3) linked to one of the external GlcNAc residues in the (α1-6) antenna;40) see Fig. 8.

Fig. 8.

Tetra-antennary compounds, extended with a Fuc residue at alternative positions as indicated by dotted lines. In the major compound Fuc is linked to GlcNAc 7.

The di- to tetra-antennary chains are unevenly distributed over the five potential glycosylation sites. Terminal sialic acid can be linked in (α2-3) or (α2-6) linkage and can be assigned on guidance of the structural reporter groups H-3ax and H-3eq. Attachment of Sia(α2-3) to the Lex determinant affords SiaLex.

Penta-antennary N-glycan in hen ovomucoid.

The hen ovomucoid exhibits a large heterogeneity in its N-glycans. In addition to a series of more frequently occurring glycans, a novel bisected penta-antennary structure, which could be identified by 1H-NMR, in combination with methylation analysis and partial acid hydrolysis.41) The Man-H-1 and H-2 and NAc chemical shifts are typical for the bisected structure. The tri-substitution of Man 4′ comes further to expression in the chemical shift of H-4 of Man 4′. Subsequently, we observed that GlcNAc 7 could be extended with Gal(β1-4) as followed from the chemical shift of the Gal-anomeric proton and from the two significant changes in the GlcNAc reporter group signals in comparison to the agalacto compound.42) The structures are shown in Fig. 9.

Fig. 9.

A penta-antennary structure, which can be extended by a Gal residue at GlcNAc 7 as indicated by a dotted linkage.

Trisialyl di-antennary N-glycan in rat plasma hemopexin.

Glycopeptides obtained by pronase digestion of rat plasma hemopexin contain in addition to mono- and di-sialyl diantennary constituents, a trisialyl diantennary compound, bearing three Neu5Ac residues. The localization of the sialic acid residues is as follows: To Gal 6 Neu5Ac(α2-3) is linked and to both GlcNAc 5 and Gal 6′ Neu5Ac(α2-6) is attached. Gal 6 is attached in a (β1-3) linkage to GlcNAc 5 as shown in Fig. 10. In the 1H NMR spectra, the reporter group signals of the three Neu5Ac residues are indicative.43) Interestingly, by methylation analysis it could be shown that also tiny amounts of Neu5Ac(α2-8) are present. The structures are not unique, but also present in other glycoproteins. However, in this study for the first time the NMR parameters were documented.43)

Fig. 10.

Structure of a trisialyl diantennary compound as found in the N-glycan of rat hemopexin.

O-Acetylated sialic acids in equine fibrinogen.

The glycans of equine fibrinogen consist of mono- and disialo diantennary lactosamine type of structures in a molar ratio of approximately 2:3. The (α2-6) linked N-acetylneuraminic acids are partially O-acetylated at C-4. The ratio of Neu5Ac to Neu4,5Ac2 is about 3:2. The mono-sialo compounds may in the (1-6) arm lack Neu5Ac or Neu4,5Ac2 and even Gal 6′. 4-O-acetylation of sialic acid introduces a large additional heterogeneity. In the disialo compounds the heterogeneity is even larger, since in both antennae the partially O-acetylated sialic acids occur, as presented in Fig. 11. This sialic acid residue is identified by its structural reporter groups namely, H-3ax, H-3eq, H-4, H-5, NAc, and OAc.44) This 4-O-acetylation is a characteristic feature of equine glycoproteins, but later studies showed that this substitution is not unique.

Fig. 11.

4-O-acetylated sialic acid in in glycans of equine fibrinogen. Note that the composite structure indicates an impressive heterogeneity resulting from 4-O-acetylation of sialic acid and from the presence/absence of sialic acid as well as of Gal 6′. By lines the various combinations of terminal residues are depicted.

Gal(α1-3) and NGc epitopes as well as 3-O-SO4Gal-6′ and 6-O-SO4GlcNAc-5 in porcine thyroglobulin.

The major acidic compounds of porcine thyroglobulin are mono-or disialylated, fucosylated diantennary compounds. The Man(α1-6) antenna shows an impressive heterogeneity since it can have as terminus Man-4′, GlcNAc-5′ or Gal-6′. Furthermore, Gal-6′ can be extended with Neu5Ac(α2-3), with sulfate at C-3 or with the Gal(α1-3) epitope. At Gal-6′ Neu5Ac or Neu5Gc can be attached in (α2-6) linkage, as shown in Fig. 12. The Gal(α1-3) as well as the N-glycolylsialic acid entities have characteristic structural reporter groups.23) Importantly, both epitopes are not compatible with the human system. This is relevant for biologics, biosimilars and eventual xenotransplantation. A further heterogeneity stems from the presence or absence of sulfate attached to C6 of GlcNAc 5 in the Man(α1-3) antenna. As mentioned before, the extension with sulfate gives in general rise to a downfield shift of the geminal and vicinal proton signals. In a minor fraction compounds containing tri-sialylated, fucosylated tri-antennary structures are present.45)

Fig. 12.

A composite structure of the glycans occurring in porcine thyroglobulin showing fucosylated diantennary compounds. In the Manα1-6 branch the heterogeneity at Gal 6′ is indicated by dotted lines. The antenna may also end on GlcNAc 5′ or Man 4′. In the Manα1-3 branch the presence or absence of a SO4 group and the occurrence of N-glycolyl or N-acetyl sialic acid are other sources of heterogeneity.

4-O-SO4GalNAc in human urokinase.

Urinary-type plasminogen activator (u-PA) is a serine protease, playing important biological roles. The protein has a single N-glycosylation site at Asn-302, which exhibits a large heterogeneity and an O-linked Fuc at Thr-18 in the epidermal growth factor domain.46) The N-glycans are (α1-6)-fucosylated diantennary complex structures, wherein GalNAc has replaced Gal in both antennae. We observed that either GlcNAc 5 or GlcNAc 5′ can be substituted with Fuc(α1-3), yielding GalNAc(β1-4)[Fuc(α1-3)]GlcNAc(β1-2) as novel structural element as depicted in Fig. 13a.47) In the acidic N-glycans, GalNAc in one branch or in both branches may be 4-O-sulfated, as shown in Fig. 13b. To a minor extent, the (α1-3) antenna can be capped with Neu5Ac(α2-6). Sialylation in (α2-3) linkage can occur in the (α1-6) branch, in conjunction with 4-O-sulfation of GalNAc in the (α1-3) antenna. The major glycan is a (α1-6)fucosylated, diantennary structure with 4-SO4 GalNAc(β1-4)GlcNAc(β1-2) termini. A composite structure gives a summary in Fig. 13c.

Fig. 13.

(a) Fucosylated diantennary glycan carrying GalNAcβ1-4[Fucα1-3]GlcNAc as novel epitope in urokinase. By the dotted lines the partial presence is shown. (b) Sulfated glycan found in urokinase. 4-SO4GalNAc has besides in urokinase, also been identified in a number of other glycoproteins. By the dotted lines the partial presence is shown. (c) The acidic sialylated, sulfated fraction in urokinase exhibits a considerable heterogeneity as summarised in this composite structure. By dotted lines possible linkages are indicated. (d) Triantennary compound in the acidic glycan fraction from urokinase, carrying sialic acid at GalNAc-8 and 4-SO4GalNAc at the other termini.

Also fully sulfated tri′-antennary compounds are present as well as a tri-antennary compound carrying 4-O-sulfate at GalNAc 6 and 6′ in combination with Neu5Ac(α2-3) at GalNAc 8, as shown in Fig. 13d.48)

HNK1 epitope in P0.

P0 glycoprotein is the most abundant protein constituent of peripheral myelin. It consists of a single extracellular immunoglobulin-like domain, a transmembrane part and a cytoplasmic tail. P0 appears at the initial stage of myelination and functions in the formation and maintenance of myelin compaction as an adhesion molecule. Severe neurological disorders are associated with mutations in the P0 gene. P0 isolated from bovine sciatic nerves has a single N-glycosylation site. P0 is the first glycoprotein wherein the HNK1 epitope consisting of 3-O-SO4GlcA(β1-3)Gal(β1-4)GlcNAc(β1-) could be established by 1H-NMR.49) We could further show that this glycosylation site exhibits a formidable heterogeneity. The major part of the glycan structures could be identified. Basically, the glycans can be conceived as hybrid type or as fucosylated, diantennary complex compounds, with or without a bisecting GlcNAc. The amazing diversity stems from the non-reducing termini attached to the Man(α1-3) and Man(α1-6) residues. At the Man(α1-3) branch GlcNAc 5 can be partially sulfated at C-6. Further extension may give rise to 6-O-sulfo HNK1 and 6-O-sulfo sialyl Lex as remarkable elements. Alternatively, Gal 6 may be extended with Neu5Ac(α2-3), Neu5Gc(α2-3), Neu5Ac(α2-8)Neu5Ac(α2-3) or Gal(α1-3) units. The epitope Gal(α1-3)Gal(β1-4)6-O-SO4-GlcNAc was here observed for the first time. The same holds for Neu5Ac(α2-8)Neu5Ac(α2-3)Gal(β1-4)6-O-SO4-GlcNAc.

The Man(α1-6) antenna can be extended with Man(α1-3), [Man(α1-6)]Man(α1-3), GlcNAc(β1-2) or HNK-1. In a composite formula shown in Fig. 14, the various termini are summarized in rectangles. The characterization of these structures required the combination of advanced NMR and MS techniques due to the small amounts of material and the complexity of the structures.49),50) The glycan moiety of P0 plays a role in cell-cell adhesion, possibly via homophilic binding. At least one of the glycoforms is important for the function of the molecule, it may be a HNK-1 carrying epitope. P0 exposes appropriate glycans for recognition, including its own glycans. The precise molecular mechanism of this auto-recognition remains to be established. The biological significance of the diversity and of the individual epitopes remain intriguing questions.

Fig. 14.

In this composite structure the enormous heterogeneity is shown for the glycans of P0, attached to the single N-glycosylation site. In the various rectangles the structural elements are indicated that are identified at the antennae as possible extensions of the core structure. The dotted lines indicate that these groups can be present or absent. The different possible combinations of elements constitute the actual basis for the observed formidable heterogeneity.

Sda determinant in Tamm Horsfall Glycoprotein.

The Tamm Horsfall Glycoprotein was first recognized as a urinary mucoprotein.51) It is now characterized as the most abundant glycoprotein in normal urine. The excretion level may be as high as 100 mg/day. When isolated from the urine of pregnant women the glycoprotein is called uromodulin, but it has still the same amino acid sequence.52) The glycoprotein is expressed in the thick ascending limbs of the loop of Henle and in the early distal convoluted tube of the nephron in the kidney.53) It is a membrane bound, GPI anchored glycoprotein from which the signal peptide of 24 amino acids and the propeptide of 26 amino acids have been split off.54) By proteolysis it is cleaved from the membrane. The protein consists of three EGF domains, one DC8 and one ZP domain as shown in Fig. 15. It is heavily N-glycosylated up to 25–30% of the molecular mass. The protein is difficult to purify due to its gel-like behavior and the easy polymerization to filaments or matrices. These two features are ascribed to the ZP domain.55) The glycosylation pattern of this glycoprotein is highly complex as we showed in a few studies.56),57) The protein obtained from the urine of a single male donor, contains 8 consensus sequences for N-glycosylation. Counting from amino acid residue 1 in the secreted form, Asn 14 is not glycosylated in contrast to the sites at Asn 52, 56, 208, 251, 298, 372 and 489. Oligomannose type chains, ranging from Man5GlcNAc2 to Man8GlcNAc2 and representing about 5% of the total carbohydrate content are only occurring at glycosylation site 251.58) We obtained more than 150 glycan-containing fractions, representing the combined (micro)heterogeneity of the complex glycosylation sites. By 1H NMR analysis of the isolated glycans we established the structure of compounds ranging from a nonfucosylated, monosialylated diantennary structure to sialylated, di-, tri- and tetra-antennary compounds containing fucose and sulfate groups as well as motifs like additional N-acetyllactosamine units and terminal Sda epitopes. The sulfates occur as 3-O-sulfated Gal and as 4-O-sulfated GalNAc in mono- to trisulfated N-glycans. The results can hardly be summarized in a single composite structure. In Fig. 16a a presentation is given for part of the structures in terms of non-reducing termini and core structures. In Fig. 16b one of the most complicated structures is shown.

Fig. 15.

The backbone structure is shown of the protein chain of Tamm-Horsfall glycoprotein. The different protein domains and the N-glycosylation sites are indicated. The protein is connected to the membrane via a GPI-anchor.

Fig. 16.

(a) The identified non-reducing termini found in Glycans from Tamm-Horsfall glycoprotein are listed. The core starts with the common pentasaccharide unit that can be extended to di-, tri- and tetraantennae. To GlcNAc-1 can be attached Fucα1-6. (b) One of the most complicated tetraantennary structures in the glycan of Tamm Horsfall glycoprotein is presented. At the termini Sda determinants occur.

The occurrence of donor-specificity in glycan structures of the Tamm Horsfall glycoprotein, could be demonstrated for N-acetyllactosamine units, oligomannosyl components, and Sda epitopes, respectively. We studied this in detail for the Sda determinant: NeuAc(α2-3)[GalNAc(β1-4)] Gal(β1-4)GlcNAc(β1-3)Gal as follows. This determinant can be released by endo-β-galactosidase digestion of the glycoprotein, in conjunction with the formation of the tetrasaccharide Neu5NAc(α2-3)Gal(β1-4)GlcNAc(β1-3)Gal and trisaccharide Gal(β1-4)GlcNAc(β1-3)Gal. By analysis of THp from 4 unrelated healthy male donors we have shown that the molar ratio of released Sda pentasaccharide to tetrasaccharide and thereby the Sda-related glycosylation is donor-specific. The donor-specificity holds also for the total content of the Sda pentasaccharide plus tetrasaccharide.59) The question arose whether in genetically identical individuals the donor-specificity would be the same. In fact, a study of THp from two monozygotic pairs of twins showed the qualitative and quantitative identity of the Sda content, thereby providing additional evidence for the donor-specificity.60)

The THp isolated from pregnant women is designated uromodulin in view of the presumed immunomodulatory properties.52) Since the amino acid sequence has not been altered, it is obvious to ascribe eventual changes in properties to differences in glycan structures. We investigated the glycan structures for uromodulin from 3 pregnant women at different stages of pregnancy. For the N-glycans the negative charge distribution of sialic acid and sulfate remained virtually constant. By consequence it was presumed that the branching pattern remained unaltered. Pregnancy leads to slight changes in type of oligo-mannosyl structures.61) However, it is not certain that these differences arise from pregnancies, because the donor specificity can already account for significant variation of the content of oligo-mannose structures. In one publication, the occurrence of O-linked glycans was reported.62) These O-linked compounds should undergo changes in structure during pregnancy. We could not confirm the occurrence of O-linked glycans.55)57) (Rohfritsch, Ph.H. et al. unpublished). Furthermore, we showed that for our preparations the immunosuppressive activity of the glycan pools was invariant during pregnancy.

Glycans occurring on human epidermal growth factor receptor.

Many details are known of the epidermal growth factor receptor (EGFR). EGFR is a transmembrane glycoprotein with 11 potential N-glycosylation sites. The N-glycosylation is essential for its biological function. Here we could focus for the first time on the structure of the glycans of human non-recombinant EGFR. The human epidermoid carcinoma cell line A431 secretes the extracellular domain of EGFR as a soluble 105-kDa glycoprotein. We investigated the structure of the N-glycans of the secreted glycoprotein by NMR and MS. The glycans, released by PNGase F show a large heterogeneity. The oligomannose constituents vary from Man8GlcNAc2 to Man5GlcNAc2 and account for about 17% of the glycans, but Man7GlcNAc2 and Man6GlcNAc2 are the main compounds.

The complex carbohydrates comprise di-, tri′-, and tetra-antennary glycans, both neutral and (α2-3)sialylated compounds, representing 24% and 59%, respectively, of the total glycan pool. The antennae can carry A, H, Lex, Ley, ALey or sialyl-Lex bloodgroup antigens. From the enormous heterogeneity 55 glycan structures were assigned and 32 as novel.63),64) In Fig. 17 a representative tri′-antennary structure is presented, carrying the bloodgroup A epitope.

Fig. 17.

Out of the large number of N-glycan-structures attached to sEGFR, an example is given of a tri′-antennary compound containing a bloodgroup A epitope.

Two types of Fuc attached to GlcNAc-1 in honeybee venom phospholipase A2.

The core structure of the N-glycans of honeybee venom phospholipase A2 can show the occurrence of both Fuc(α1-3) and Fucα(1-6) at the Asn-bound GlcNAc. This was deduced from the structural reporter group signals of Fuc, which are characteristic for these types of linkages. The presence or absence of Man(α1-3) and of the Fuc residues are sources of heterogeneity as presented by dotted lines in Fig. 18a. Also the oligomannose units without Fuc demonstrate some variability as shown in Fig. 18b. Interestingly, in complex type of structures the Man(α1-3) branch has as terminal structure GalNAc(β1-4)[Fuc(α1-3)] GlcNAc.65),66) This is depicted in Fig. 18c.

Fig. 18.

(a) Difucosyl glycans from honey bee phospholipase A2 with heterogeneity as depicted by dotted lines. (b) Oligomannose type structures of honey bee phospholipase A2 without Fuc. (c) Complex type glycan of honey bee phospholipase A2 with core Fuc residues, occurring only partially as indicated by dotted lines. Thereby a significant heterogeneity is introduced.

Bromelain.

Pineapple stem Bromelain has only a single N-glycan and exhibits no micro-heterogeneity as determined by Ishihara et al.67) The N-linked glycan of this glycoprotein contains Xyl linked to the branching Man of the core structure, furthermore Fuc(α1-3) is linked to the Asn-bound GlcNAc as given in Fig. 19.

Fig. 19.

The Glycan structure of pine apple stem bromelain.

The NMR parameters showed that this Fuc(α1-3) could easily be recognised, since the structural reporter groups for Fuc(α1-3) and Fuc(α1-6) differ significantly and are characteristic for each type. This concerns the chemical shifts of Fuc H-1, H-5 and CH3, as well as of the neighbouring GlcNAc signals. Xyl(β1-2) has its own characteristic reporter signals H-1, H-2 and H-5ax and H-5eq.68) In a further study we established also the conformation of the glycan.69)

Glycans derived from hemocyanins.

Hemocyanin is the Cu(I)-containing oxygen transporter glycoprotein of most arthropod and mollusc species. The glycans of the hemocyanins are quite interesting, due to their relatively uncommon structures. We reviewed this group of glycans in ref. 70.

3-O-methyl sugars in hemocyanin.

In a mass spectrometric study we could demonstrate that hemocyanin from Helix pomatia contains 3-O-methylgalactose in addition to Fuc, Xyl, GalNAc and GlcNAc.

However, in hemocyanin from Lymnea stagnalis the occurrence of 3-O-methylmannose was established.71)

Hemocyanin of the mollusc Lymnea stagnalis.

Hemocyanin-glycan of Lymnea stagnalis has a core structure with Xyl(β1-2) linked to the branching Man and 3-O-Me-Man at the non-reducing termini, but without fucosylation as shown in Fig. 20a.72) Alternatively, the Man residues can be extended to a mono-or di-antenna with 3-OMe-Galβ(1-3)GalNAc(β1-4)GlcNAc(β1-2). In turn GalNAc can be substituted with 3-OMe-Galβ(1-3) or with Fuc(α1-2)Gal(β1-3). In Fig. 20b an example of such a structure is presented. Further extensions of 3-OMe-Man or 3/4-OMe-Gal have not been observed, suggesting that here methylation could act as a kind of stop signal.73)

Fig. 20.

(a) 3-OMeMan in glycans of hemocyanin from Lymnea stagnalis. (b) Extended antennae in the glycans of hemocyanin from Lymnea stagnalis.

Hemocyanin of Helix Pomatia.

In a further and detailed investigation of hemocyanin from Helix Pomatia we could establish the localization of 3-O-methylgalactose. The N-glycans obtained by PNGase F degradation have in common the pentasaccharide core largely substituted with Xyl(β1-2) at the branching mannose residue and for the greater part with Fuc(α1-6) at the Asn-linked GlcNAc. Structures lacking Xyl, miss also Fuc. Four novel antennae and in total 21 novel mono-antennary and di-antennary N-linked glycans were identified. Most structures contain 3-OMe-Galβ and occasionally 4-OMe-Galβ. The antennae are linked to one or both of the core Man residues.74)

One of the most complex structures is presented in Fig. 21.

Fig. 21.

One the most complicated glycan structures of hemocyanin from Helix Pomatia.

Hemocyanin of Panuluris interruptus.

It should be noted that not all hemocyanins are xylosylated, for example hemocyanin of the spiny lobster Panulirus interruptus contains mainly neutral N-glycans like oligomannose structures. The pentasaccharide core can also be extended with GlcNAc(β1-2) and to Man-4 sulfate can be attached at position 6, as shown by a dotted line in Fig. 22.75),76)

Fig. 22.

Acidic glycan of hemocyanin from Panulirus interruptus.

General features of O-glycoproteins and the corresponding 1H-NMR data.

The O-glycans belonging to the mucin type of chain are attached to the protein through GalNAcα(1-O) to the hydroxyl groups of Ser or Thr. These glycans have in common GalNAc as starting point that may be extended at C-3 and/or C-6 by Gal, GalNAc or GlcNAc and at C-6 also by Neu5Ac/5Gc. The substitutions give rise to 8 different types of core structure. We analysed the O-glycans as alditols, obtained after alkaline borohydride reduction of the O-glycoprotein. The combination of the chemical shift of H-2 and of H-5 of the GalNAc-ol is an excellent monitor of all core types. In a comprehensive review, we summarized the chemical shifts of a large series of O-glycans.22)

Also the structure of a number of functionally important O-glycans, other than of the mucin type, has been identified. In these cases we characterized the structures by 1H NMR. However, the collection of NMR data is too limited to indicate useful reference ranges of chemical shifts.

Bloodgroup activities in cervical mucus glycoproteins from Bonnet monkey (Maccaca radiata).

The mucus glycoproteins from pooled midcycle cervical epithelial secretion of the Bonnet monkey contain multiple bloodgroup activities. Therefore, the structure of the O-linked glycans was analysed. After treatment of the purified glycoprotein fraction with alkaline borohydride the sialyl-oligosaccharide alditols were isolated and fractionated. Interestingly, the glycans contain structure elements corresponding to either A, B or H activities, which may occur in combination with the Sda epitope, as shown in Fig. 23.77) An obvious explanation would be that this heterogeneity of blood group determinants stems from the pooling of the mucus samples.

Fig. 23.

Structure of a glycan from cervical mucus containing the Sda and bloodgroup A epitopes.

N, O-glycoproteins

Poly-N-acetyllactosamine-containing O-glycans of porcine zona pellucida glycoproteins.

As mentioned before in a glycoprotein N-glycans can occur besides O-glycans. For the analysis of the glycan structures often the N-glycans are detached by PNGase followed by cleavage of the O-glycans from the purified O-glycoprotein. A summary of our procedures for N, O-glycoprotein-analysis has been presented in ref. 78.

The porcine zona pellucida ZP3 family of glycoproteins was isolated from porcine oocytes. In particular, the acidic O-glycans of these N-, O-glycoproteins are interesting because of the occurrence of repeating units of Gal(β1-4)[6SO4]GlcNAc. For 32 O-glycans the structure was determined. Most of the main structures belong to a family of compounds, summarized in a composite formula Fig. 24.79),80)

Fig. 24.

Acidic O-glycans derived from porcine zona pellucida glycoproteins. The repeating lactosamine units in the compounds are shown.

Typical for this series of compounds are the chemical shifts of Gal H-1 and Gal H-4 in the various lactosamine units as compiled in Tables 1–5 of ref. 80. However, in addition, several minor structures are present.

The structure of the neutral N-glycans have been studied by Kobata et al.81)

Poly-N-acetyllactosamine-containing O-glycans of equine chorionic gonadotropin (eCG).

Equine chorionic gonadotropin belongs to the class of heterodimeric (α and β subunits) glycoproteins. In many respects the hormone behaves very similar as other gonadotropins, but interestingly, it has from all glycoprotein hormones the highest carbohydrate content. For analysis N- and O-glycans were split off. After isolation and purification, the released glycans were identified by 1H-NMR and FAB MS. The N-glycosylation of the β-subunit exhibits glycans of considerable heterogeneity. They comprise mono-, di-, tri- and tri′-antennary structures. In part of the compounds the Asn linked GlcNAc is substituted with Fuc(α1-6). Neu5Ac(α2-6) is the main terminal residue in the antennae, most pronounced in the di-antennary compounds. Overall, a significant portion of Neu5Ac is 4-O-acetylated. Interestingly, partially 4-O-acetylated Neu5Ac(α2-3), occurs only in tri- and tri′-antennary compounds.

The O-glycans are mostly present in the β-subunit and they consist predominantly of tri-, tetra-, penta- and hexa-saccharides of core type 2. A small part of these glycans contain repeating N-acetyllactosamine units.82),83) These structures are summarized in Fig. 25. It can be anticipated that part of the Neu5NAc residues in O-glycans is 4-O-acetylated, in analogy to the N-linked glycans. However, due to the alkaline borohydride treatment this information gets lost.

Fig. 25.

Structures of O-glycans derived from equine chorionic gonadotropin showing various numbers of repeating lactosamine units.

Recombinant glycoproteins.

The demand for human glycoproteins for diagnostic and therapeutic purposes is still increasing. This has stimulated the production of recombinant glycoproteins since the natural glycoproteins are hardly accessible in large quantities. However, expression in heterologous cells leads to the glycosylation pattern of the corresponding cell type. Comparison of the structures of these glycans with those of the human system is essential to gain insight in the changes in properties and compatibility with the human immune system. The introduction of N-glycolylneuraminic acid or the Galili epitope Gal(α1-3)Gal(β1-3/4)GlcNAc-R, are obvious examples of unfavourable epitopes that should be avoided or eliminated to prevent unwanted reactions. These epitopes can be recognized by their 1H NMR data.23) Nevertheless, some commercial preparations are available containing to some extent non-human epitopes.

From a significant number of recombinant glycoproteins, we have established the glycan structures.84) Two examples will be presented. The first one we studied was γ-interferon expressed in CHO cells. A regular N-linked diantennary glycan was found, partially substituted with Fuc(α1-6) linked at the Asn bound GlcNAc and partially capped at both antennae with (α2-3) linked Neu5Ac.85) The exclusive occurrence of (α2-3) linked Neu5Ac is typical for CHO cells, since α2,6-sialyltransferase is absent in these cells. A more complex glycoprotein is human FSH expressed in CHO cells. We determined the glycan structures mainly by 1H NMR spectroscopy.86) The N-linked glycans were identified as monosialylated diantennary, disialylated diantennary (main constituent), disialylated tri-antennary, trisialylated tri-antennary, trisialylated tri′-antennary and tetra-antennary compounds. A small percentage of these chains contains additional N-acetyllactosamine residues. Sialic acid is exclusively (α2-3) linked. Comparison of recombinant and pituitary FSH, shows as significant differences the occurrence in pituitary FSH of (α2-6) linked sialic acid and bisecting GlcNAc. Importantly, the recombinant product can be pharmaceutically applied and is perfectly compatible with the human system.

C-linked α-mannopyranose in urinary RNase II.

The existence in glycoproteins of a C-linkage between carbohydrate and protein was discovered in human urinary RNase II. This enzyme contains a α-mannopyranosyl residue that is C-linked to the C-2 position of the indole nucleus of tryptophan. This was discovered in the glycopeptide PheThr(α-Manp)TrpAlaGlnTrp, derived from RNase II. The C-linkage was proved by 1H- and 13C-NMR spectroscopy, in conjunction with mass spectrometry as depicted in Fig. 26.4),5),87) As the most simple consensus sequence for C-mannosylation TrpXxxXxxTrp has been established, wherein the first Trp is C-mannosylated.88) C-mannosylation occurs in many proteins and is widely distributed in eukaryotic organisms and is accomplished by C-mannosyltransferase89) with dolicholphosphate-mannose as donor.90) This form of glycosylation is not occurring in bacteria or in yeasts, probably due to the absence of a C-mannosyltransferase. A program has been developed for the prediction of potential C-Man sites.91) Interestingly, in glycopeptides derived from RNase II the conformation of the Manp ring at the NMR time-scale can only be described by an ensemble of conformations. A main contribution to this ensemble stems from the 1C4 chair. A further source of flexibility stems from rotation around Trp C2-Man C-1 bond. The orientation of Man around the C-linkage in native RNase 2 is different from that in the glycopeptide and in the denatured protein.5),92) Apparently, the 3-D structure of the native protein affects the conformational features of the C-Man residue. This influence cannot be observed in glycopeptides (Beccati, D. et al. unpublished). The biological function of C-mannosylation requires further investigation.

Fig. 26.

α-Mannose in the 1C4 conformation C-linked to the C-2 of tryptophan. The shown conformation of the αManp-residue contributes to the ensemble of conformations in the native Glycoprotein. Rotation around the C-C–linkage as indicated by an arrow, yields a conformation contributing to the conformational ensemble in glycopeptides and in denatured glycoproteins.

Glycosaminoglycans.

Glycosaminoglycans (GAG’s) constitute an important class of linear glycan chains O-linked to Ser of the protein backbone of proteoglycans. Compounds like heparan, heparan sulfate, chondroitin sulfate, dermatan sulfate and keratan sulfate belong to this class. The functional and biological properties exhibit a large diversity. From a structural point of view the compounds are rather complex despite of the fact that they are only built up from a linkage region consisting of Gal(β1-3)Gal(β1-4)Xyl(β1-O)Ser as shown in Fig. 27. The core can be extended with repeating disaccharides with GlcA or IduA and with GalNAc or GlcNAc as constituents. The repeating units can be partially and/or differently be sulfated. The resulting large diversity in structure stems not only from the differences in constituents and glycosidic linkages, but also from a diversity in chain lengths in conjunction with the distribution of sulfate groups over the chain. In this way, unique stretches occur in the structure that may be responsible for specific biological effects. For example, in heparin is a pentasaccharide unit carrier of the anticoagulant properties. In general, the fact that fragments may show a clearly defined activity in bioassays is a complication in defining the biological function in overall terms of the parent structures, because these macromolecules could harbour more than one activity. A further aspect concerns the influence of the neighbouring residues in the polymer on the biological activity of a region in the chain. For the NMR studies we started with the analysis of fragments of the linkage region. Thanks to the availability of synthetic compounds, the structural reporter group signals could excellently be determined.93) As a next step we unravelled by 1H NMR spectroscopy a number of structures in the extended linkage regions of several GAG’s, obtained by enzymatic degradation of the macromolecules. At the reducing end the compounds are still linked to Ser and due to the specificity of the used enzymes at the non-reducing end Δ4-GlcA is present. After defining unsubstituted hexasaccharides, the extensions with the disaccharides substituted at various positions with sulfate groups are defined by the usual individual reporter group signals and by sulfate-induced downfield chemical shifts. Characteristic signals belong to Δ4-GlcA. Interestingly, a novel structural element was discovered in chondroitin sulfate from swarm rat chondrosarcoma containing a 4SO4GalNAc residue in the extended core region as shown in Fig. 28.94)

Fig. 27.

Linkage region in Glycosaminoglycans.

Fig. 28.

Extended linkage region as occurring in chondroitin 4-sulfate in swarm rat chondrosarcoma. Novel structural element is 4SO4Gal at that position.

Subsequently, various series of compounds obtained from diverse sources and representing different structures were investigated.95)99) The occurrence of various sulfate groups affects the conformational behaviour of these structures. Discussion of the effects on the conformation is beyond the scope of this review.

Functions.

The wide spread decoration of proteins with covalently linked carbohydrates gives rise to fundamental questions about the function of the glycans. First of all, these glycans have a significant influence on the outer sphere of the biomacromolecule and thereby on its contact with the environment. Secondly, biological functions of glycans are often described in global terms like protein folding, protein trafficking, cell signalling, normal and abnormal cell growth, life-time of cells and glycoproteins, targets for interaction e.g., with pathogens, receptors, antibodies and lectins. For gaining insight in the nature of these functions translation of the problem to the molecular level is essential. This requires detailed knowledge of the three-dimensional structure of the glycoprotein. It should be noted that also a (partially) disordered state is a form of three-dimensional structure, which might be biologically significant, an aspect that has to be taken into account for more than the glycan moiety of glycoproteins, only.100)102) In most of the discussions on glycoproteins it is neglected that local occurrence of protein random coil structures might be of importance for the function. Next, the molecular details of the interaction of glycoproteins with complementary partner-molecules like receptors and the mutual influence of the respective structures are needed as well. In particular, for membrane bound glycoproteins the effect neighbouring molecules needs consideration. The intermolecular interaction with complementary compounds may concern proteins, carbohydrates, glycoproteins or glycolipids. Carbohydrate-carbohydrate interaction can be of homo- or hetero-type e.g., in cell-cell interactions.103) Since the carbohydrate-carbohydrate interactions are much weaker than carbohydrate-protein interactions, polyvalent presentation is needed to be effective.104) This interaction may require involvement of (part) of the glycan structure, (part) of the protein or comprising (parts) of both entities. A number of techniques including SPR, NMR and X-ray crystallography are available to study aspects of the interaction of the glycan.105),106) Although for biological functions the attention is mainly focused on intermolecular interactions, also the intramolecular interactions should be taken into account. The latter interaction may affect the protein-folding and the three-dimensional structure of the glycoprotein. For example, glycosylation can lead to masking of peptide epitopes as an important feature. The intramolecular interactions may be essential for the presentation and dynamics of the glycans and/or protein-chain. In a N-glycoprotein, the interaction of glycans with the protein starts with the Asn-bound GlcNAc. Screening of the protein crystallographic database shows that in X-ray structures this GlcNAc can be oriented in different ways. The orientation can be for example O5-edge, solvent exposed, with α- or with β-face towards the protein. In fact, the GlcNAc-protein interaction has consequences for the conformation and presentation of the outer part of the glycan structure. An interesting example is provided by the two N-glycans of the free α-chain of human chorionic gonadotropin. The N-glycan at Asn78 is resistant towards PNGase F, unless the α-chain has been denatured. This observation is in agreement with the ordered structure of the part of the glycan close to the protein chain. The effect is noticeable in the whole glycan and will thereby affect the interaction with complementary molecules. In contrast, the other N-glycan at Asn52 can be directly split off by PNGase F, due to the absence of interaction between the Asn-bound GlcNAc and the protein.107),108)

In general, glycoproteins are dynamic entities and may just like other macromolecules consist of an ensemble of conformations. The intrinsic flexibility of glycans enlarges the number of conformations of glycoproteins considerably in comparison to the parent protein. This structural property is relevant in exerting biological functions.

Concluding remarks.

This review is centred on our development of high resolution NMR spectroscopy as an important new tool for the structure determination of glycans. As illustrated, we could establish a novel route to identify unambiguously numerous N- and O-linked glycans of glycoproteins and discovered C-linked mannose as a new type of linkage. A large collection of characteristic chemical shifts was obtained. The structural reporter group concept is robust and can be widely applied for the identification of carbohydrate chains stemming from various sources. The structural reporter group signals are rather insensitive to alterations in the structure remote from the locus involved. Interestingly, the derived principles of glycan identification on guidance of the structural reporter group concept, turned out to be perfectly applicable to many glycans like occurring in glycolipids, glycosaminoglycans, proteoglycans and oligosaccharides derived from homo- as well as hetero-polysaccharides. The greater part of the NMR parameters we collected is summarised in refs. 1924. The analysis of glycans, starting from simple compounds up to highly complex structures yielded insight into metabolic pathways comprising biosynthesis and catabolic processes. Also for the diagnosis of congenital disorders of carbohydrate metabolism, it is relevant to be able to determine unambiguously the structure of metabolites e.g., in urine or serum. Many fundamental results were obtained that allowed further studies regarding the biological and physical function of glycans in glycoproteins. The studies reveal the impressive variety of glycan structures in nature. For cell-cell interactions also glycolipids have an essential role. Glycoproteins and glycolipids have many epitopes in common. For each of the glycans the presentation to the outside world, the imbedding of the glycoconjugates in the cell membrane, as well as the mobility and dynamics have to be considered to come to full understanding of the mode of action.

Important challenges remain the functioning of glycans in embryogenesis, cell differentiation, normal and malignant growth and senescence. Advances in analysing the human proteome and the proteome-adaptation in cell programming, show the need for the study of the dynamic aspects of the glycome and the glycoproteome to gain insight in the actual biological functioning of glycoproteins. The unravelling of the message encoded by those structures remains thereby an intriguing problem.

Profile

Johannes Frederik Gerardus (Hans) Vliegenthart, was born in The Netherlands, in 1936. He studied Chemistry at Utrecht University and obtained his Ph.D. in 1967. Subsequently, he was offered a research position at Utrecht University to create a new research group focused on Bio-organic Chemistry. As specific project, he selected determination of the structure of carbohydrates by mass spectrometry and NMR spectroscopy. These techniques were under development offering new approaches to this field. The application of mass spectrometry enabled the determination of structural details of oligosaccharides and the identification of a large number of members of the sialic acid family. A breakthrough was realized by the introduction of high-resolution 1H-NMR spectroscopy. The resolving power of the high-field instruments (220–900 MHz) allowed the identification of numerous glycans, ranging from oligosaccharides to very complex compounds. For this work he developed the ‘structural reported group concept’ that made the unambiguous analysis of glycans feasible. In parallel programs, organic synthesis of oligosaccharides was performed and the development of synthetic carbohydrate-based vaccines initiated. New methods were explored to study carbohydrate-carbohydrate interactions. For example, after structure determination and synthesis of the glycan structures involved in the cell-cell interaction of the sponge cells Microciona prolifera, the molecular basis of the interaction could be established.

He was author or co-author of about 750 research publications and co-editor of 7 books.

Under his supervision 90 Ph.D. theses were prepared. He was appointed Professor at Utrecht University in 1975, Head of Department in 1980 and honorary Professor in 2003. He was Dean of the Faculty of chemistry from 1985–1989 and from 2000–2003. He was founder of the Bijvoet Center for Biomolecuar Research in 1988, its Scientific Director till 2000 and chairman of the International Scientific Board of the Bijvoet Center since 2006.

He acted as Chairman/Member of various national and international scientific bodies, including the International Scientific Advisory Committee of the Science Frontier Program of the Riken Institute, Wako, Japan. He organized several international and European conferences.

He is Foreign Member of the Royal Swedish Academy of Sciences (1987), Member of the Royal Netherlands Academy of Arts and Sciences (1990), Honorary Member of the American Society for Biochemistry and Molecular Biology (1989), of the Netherlands Society for Glycobiology (2006), and Fellow of IUPAC (1998).

He is Honorary Doctor at University Debrecen, Hungary (1992), at University Lille, France in 1993 and at University Stockholm, Sweden in 1997. He received the Hilditch Memorial Lecture Award in 1978, the Shield of the Medical Faculty of the University of Tokyo, Japan (1988), Louis Pasteur Medal of the University Lille, France (1993), the Claude S. Hudson Award from the ACS (1994), Bijvoet Medal of Utrecht University in 2000, Medal of Rome University, Tor Vergata, Italy (2003), Silver Medal of Utrecht University in 2003. He was distinguished as Knight in the Order of the Lion of the Netherlands (1998).

Acknowledgement

The author is deeply indebted to all former members of the Department Bio-organic Chemistry and to many colleagues from all over the world for invaluable contributions to the studies described in this review. In particular, the fruitful collaboration with colleagues in the University Lille, France and in several universities and institutes in Japan should be emphasized.

References
 
© 2017 The Japan Academy
feedback
Top