ACTA HISTOCHEMICA ET CYTOCHEMICA
Online ISSN : 1347-5800
Print ISSN : 0044-5991
ISSN-L : 0044-5991
REVIEW
What Cyto- and Histochemistry Can Do to Crack the Sugar Code
Felix A. HabermannHerbert KaltnerAlonso M. HigueroGabriel García CaballeroAnna-Kristin LudwigJoachim C. ManningJosé Abad-RodríguezHans-Joachim Gabius
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2021 Volume 54 Issue 2 Pages 31-48

Details
Abstract

As letters form the vocabulary of a language, biochemical ‘symbols’ (the building blocks of oligo- and polymers) make writing molecular messages possible. Compared to nucleotides and amino acids, sugars have chemical properties that facilitate to reach an unsurpassed level of oligomer diversity. These glycans are a part of the ubiquitous cellular glycoconjugates. Cyto- and histochemically, the glycans’ structural complexity is mapped by glycophenotyping of cells and tissues using receptors (‘readers’, thus called lectins), hereby revealing its dynamic spatiotemporal regulation: these data support the concept of a sugar code. When proceeding from work with plant (haem)agglutinins as such tools to the discovery of endogenous (tissue) lectins, it became clear that a broad panel of biological meanings can indeed be derived from the sugar-based vocabulary (the natural glycome incl. post-synthetic modifications) by glycan-lectin recognition in situ. As consequence, the immunocyto- and histochemical analysis of lectin expression is building a solid basis for the steps toward tracking down functional correlations, for example in processes leading to cell adhesion, apoptosis, autophagy or growth regulation as well as targeted delivery of glycoproteins. Introduction of labeled tissue lectins to glycan profiling assists this endeavor by detecting counterreceptor(s) in situ. Combining these tools and their applications strategically will help to take the trip toward the following long-range aim: to compile a dictionary for the glycan vocabulary that translates each message (oligosaccharide) into its bioresponse(s), that is to crack the sugar code.

I.  Introduction

A central aim of life sciences is to understand how cellular communication works. On the molecular level, information must be coded in the sense that ‘symbols’ (biochemical compounds) are used to convey information. It is ‘understood’ (decoded) on the receiving end by a ‘reader’ (receptor). Within the flow of biological information, the chemical rules of intermolecular interactions will govern the extents of complementarity and selectivity that underlie the ‘reading’ process. Molecular pairing will then set the stage for post-binding activities on different levels such as regulation of gene expression, cell adhesion, signaling or targeted transport and delivery. That sugars form such a set of symbols to put manifold messages into oligosaccharides (sugar code) is a fundamental concept, whose study offers the opportunity to gain new insights into how biological information is stored, read and translated.

In our review, we first explain principles of the three alphabets of life for coding information. This introduction guides us to the special talents of sugars, that is to realizing i) the unsurpassed coding capacity of glycans and ii) the enormous functional potential of glycan-receptor (lectin) recognition. Graphic examples will illustrate how sugars become such versatile biochemical signals, e.g. a postal code in glycoprotein transport, and how cyto- and histochemistry can contribute to understand the connection between glycan structures and their emerging biofunctionality, that is to crack the sugar code.

Beyond the realm of small molecules such as hormones or metabolites, the possibility of ‘writing’ molecular messages in the form of oligo- and polymers by using a panel of building blocks (monomers) greatly enhances the scope of information storage. In fact, it opens the door to wide structural variability of the products, as letters of an alphabet build the vocabulary of a language. Combinations of letters (oligomers) that make sense to a reader (receptor) are then biologically relevant. In general, such a system of biochemical symbols endowed with the ability for being covalently arranged into ‘words’, i.e. an alphabet of life, is the basis for developing a code, a term commonly used in different contexts with some ambiguity. The structures of the letters and their sequence, the equivalent of a message composed by a writer, define the meaning of the message; its reading by intermolecular recognition and the ensuing transfer of the presented information into distinct post-binding responses translate the coded information into the intended activity: this chain of events from writing to reading and taking action defines a code system in our context.

II.  Code Systems of Life

In biology, we have three such code systems. They are interconnected by the genetic code. In its popular sense, the genetic code assigns amino acids to their codons in template (mRNA)-directed protein biosynthesis. This translation of a triplet into an amino acid is the prerequisite for all the downstream activities of proteins and thus the root for the large signal diversity (vocabulary) of information coding that depends on proteins as writers, editors, erasers and readers. Obviously, the first code system (and biochemical foundation for heredity) rests on nucleotides as symbols.

Considering the essential need for an error-free translation (and replication) process, it is logical that the information density of nucleotide-based messages must not be too high. As consequence, a basic four-letter alphabet (of readily distinguishable symbols) appears best suited to establish the first alphabet of life [112]. Words, sentences and a library of books (the genome) are formed by connecting these monomers (nucleotides) via the 5',3'-phosphodiester bond, and the genetic coding by nucleic acids is the elementary source of information for proteins and beyond. As the line of pioneering experiments on reprogramming virulence in strains of pneumococci first with heat-inactivated bacteria [12, 13] and then by “a highly polymerized and viscous form” of DNA revealed, the enzymatic synthesis of a chemically completely different biomacromolecule, i.e. a capsular polysaccharide made of cellobiuronic acid units (its biochemical nature will be explained below in the fourth paragraph of section IV), becomes possible by the transformation process (an uptake of the required genetic material): “thus, it is evident that the inducing substance [DNA] and the substance produced in turn [polysaccharide] are chemically distinct and biologically specific in their action” [2]. Interestingly, “biochemists met this discovery with disbelief” [trained to give preference to proteins as the substance for genes], and the notion that genes consist of nucleic acid “took a long time to be generally accepted” [85].

Relevant for our topic, this paradigmatic concept brings together the three classes of biopolymers in Nature, i.e. nucleic acids, proteins and polysaccharides. They share the principle to be written with letters of the alphabets of life, i.e. nucleotides, amino acids and sugars. However, fundamental features such as coding capacity differ largely among them. Before we explain the extraordinary role of sugars in coding, we will next look beyond the basic flow of information from DNA to protein under the rules of the genetic code to find an intimate cross-talk between compound classes and efficient means to extend the vocabulary (in each biochemical language).

III.  The Alphabets of Life

Starting with genomic sequences, they are not only templates for proteins but also contain the information for the regulation of gene expression, first described in the operon model [45]. In this instance, the encoded information is read by a protein, an elementary example of molecular cross-talk. Site-specific binding of a repressor (in the eubacterial operon model) or of transcription factors (in eukaryotes), and this in a combinatorial manner, interpret regulatory DNA sequences that are the basis for the programming of gene expression [45, 65, 91, 104]. Enhancers and silencers make their presence felt on when and how intensely genes should be expressed. These discoveries on regulation of transcription paradigmatically exemplify how a biochemically written message is read, and they also set the stage to teach a second lesson: work on this platform revealed a basic principle how the coding capacity reached by a set of symbols can be increased. In the case of biomolecules, this property is defined by the structure, i.e. the symbols and their sequence as well as the shape of the oligo- or polymer. Within a message, it was revealed that certain letters can gain a new meaning by a distinct biochemical modification, as done linguistically by an accent or an Umlaut.

On the level of the genome, a site-specific (epigenetic) alteration occurs: the introduction of a methyl group into distinct cytosine moieties is the biochemical equivalent for adding a new aspect to a letter’s meaning in a special context. The status of methylation of cytosines at particular sites as well as the extent and stages of oxidation of this group up to a carboxylate have been identified as switches for gene expression [87], and the more than 170 known types of RNA modification attest the broad scale of implementation of this concept [97]. Obviously, introducing postsynthetic alterations can compensate the inherent restrictions of information coding by nucleotides to a linear sequence with just four letters and a chemically uniform backbone structure.

Thus, it is not surprising to see this elegant principle is at work for proteins, too, and this with a remarkable level of sophistication. Looking again at the control of eukaryotic gene expression for offering an instructive example, histones are known to acquire a biochemically unique signature of posttranslational modifications. These being unraveled i) to impact the nucleosome structure directly and ii) to create new recognition sites for receptors, their resulting activity as signals having prompted to introduce the term ‘histone code’ [3, 46, 92, 96, 110]. Terminologically, these information-bearing substitutions on a written message are carried out by editors, who can further process added groups; the need for inherent spatiotemporal dynamics and thus reversibility is fulfilled by erasers; the presence of distinct symbols is interpreted locally (intramolecularly) by the structural context and/or by external readers, for example in the case of acetylated lysine in a histone by a bromodomain of a receptor [77, 111, 113]. Fittingly, different families of receptors, thus a complex interactome within the histone code, have evolved to make recognition of individual types of modification possible, and this is a further instructive example for—as we will see below—a recurring fundamental theme in information transfer. Among proteins, it has become popular to use the term ‘code’ by referring either to the target of the modification(s) as in actin (or tubulin) code [75] or to the attached group. The latter is done in the case of the ubiquitin code: it covers the emerging complexity of ubiquit(in)ylation (or ubiquitination), which yields presence of linear and branched chains of diverse lengths besides the addition of a single unit at one or more sites of a target protein [56, 82, 121].

In summary, nucleotides and amino acids are the letters of the first two alphabets of life. ‘Editors’ introduce molecular substitutions into certain ‘letters’ of written messages. These are (mostly) linear (not branched) oligo- and polymers, all letters connected by a uniform type of linkage. This property sets an inherent limit to the extent of coding capacity of these two alphabets of life. In order to create a large number of short messages with a clear biological meaning, as is for example required on cell surfaces, bringing letters together in different ways is necessary, and here carbohydrates come into play.

Toward this end, the carbohydrates have unique structural talents that favor high-density coding [19, 21, 51]. Impressive as they are, the following numbers are intended to document what sugars as molecular units for oligomer synthesis make possible to spark off curiosity: whereas the set of 20 amino acids can optimally generate a total of 6.4 × 107 hexapeptides, using 20 carbohydrate letters will theoretically yield 1.44 × 1015 linear and branched hexasaccharides, and this pool size will grow tremendously by introducing substitutions such as a sulfate group, already by a factor of 10 by a single modification into a trisaccharide [63]. As consequence, what is known to be ubiquitously present in Nature as “sugary coating of cells” termed glycocalyx [4], as (complex) homo- and heteropolysaccharides in and around cell walls [72, 88, 103] or as constituent of cellular glycoconjugates (glycoproteins and glycolipids) [9, 32, 57, 74, 90] is increasingly attracting attention. Explicitly, the reason to propel work on glycans to the forefront is that “carbohydrates [as symbols for the sugar code] are ideal for generating compact units with explicit information properties” [120]. Ironically, the mentioned access to an exceptional structural diversity entails an inevitable downside: glycans are much more difficult to study than nucleic acids and proteins. Having herewith raised the interest to learn about this set of symbols and their messages, clearly defining the structural parameters that underlie coding by sugars is now called for.

IV.  The Biochemical Basis of the Sugar Code

The third alphabet of life in animals is constituted by the letters listed in Fig. 1A. In solution, these carbohydrates commonly form a pyran-like (cyclic) structure with two positions of the substituents in the ring’s plane, i.e. equatorial or axial. Glucose (Glc; transiently present on N-glycans during their processing [122], permanently a component of the disaccharide bound to collagen at hydroxylysines [39] and the starting point for the biosynthesis of the glycan chains of glycosphingolipids [57]) is the most abundant sugar in Nature. This hexopyranose, with its all-equatorial hydroxyl groups (at C2, C3 and C4 as well as the bulky exocyclic hydroxymethyl group), reaches the energetically most favorable conformer (Fig. 1A). Since all these groups point away from the ring, the possibility for sterical clashes is hereby minimized. Speaking of the hydroxyl groups of the molecule, the inspection of their positioning makes evident that the where-in-space of the hydroxyl groups establishes a topological signature (Fig. 1A). Of note, it is relevant for their engagement in intermolecular bridging by H bonding during receptor binding. A glycan-based message can thus be read readily and specifically by a building a network of H-bridge bonding. Next, another aspect warrants attention: the answer to the question how carbohydrates make structural diversity possible.

Fig. 1.

The letters of the third alphabet of life (carbohydrates defined by name, their abbreviations and the low-energy conformer with its naturally used anomeric position(s)) (A). Illustrations of how such a biochemical symbol (here the β-Gal anomer) can become part of a multitude of bioactive messages (oligosaccharides) (B), of how a panel of glycan-based messages can be altered by a post-synthetic modification (here site-specific sulfation) (C) and of how a glycan (the pentasaccharide of ganglioside GM1, a biological key) can fit into different binding sites (locks), presenting its three low-energy conformers in panel D (for further information on the bioactivities of each conformer, please see [64]).

Fig. 1.

Continued.

In order to enzymatically link carbohydrates by glycosyltransferases to a chain (message), the anomeric center (at C1 of hexopyranoses) is the site for activation (to prepare a sugar unit for conjugation), and it can be done in its two positions, i.e. the α- or β-forms. This is symbolized by arrows in Fig. 1A. Glycan synthesis attains structural variability (thus coding capacity) beyond the sequence by this parameter, i.e. the possibility to select one of the two anomeric positions. Obviously, defining the anomeric position will then pose a challenge to characterizing glycans in structural detail. That the nature of the anomeric position has a tremendous impact on the properties of products is underscored by noting the differences between cellulose (β-linked Glc polymer) and glycogen/starch (α-linked). Known from the homopolysaccharide chitin [72] and from its presence at branch ends and the core of N-glycans of cellular glycoconjugates [83, 122], a Glc derivative is widely present in Nature: Glc is modified to the corresponding amino sugar (GlcN at C2) that is acetylated to produce N-acetylglucosamine (Fig. 1A). This modification changes the character of the letter. The same is true for presenting the hydroxyl group at C2 in axial position, hereby establishing the epimer mannose. The first row of Fig. 1 therefore teaches the lessons that a letter has its own signature for complementarity and that structural variability beyond the order of letters can be generated in glycan synthesis via anomery and epimery.

The conversion of the position of a hydroxyl group from equatorial to axial is naturally also encountered for the 4-epimer galactose (Gal) and its 2-N-acetyl derivative (GalNAc) (Fig. 1A). As noted above, these chemical alterations alter the signature for H bonding and are sufficient to give each carbohydrate the character of a new symbol. When a hydroxyl group is either removed or oxidized, the molecular recognition process will similarly be affected. The 6-deoxy derivative of Gal in the sugar alphabet is fucose (Fuc), which is present in cellular glycoconjugates not as d- but as l-isomer (α-anomer (Fig. 1A)). When Glc is oxidized at C6 to a carboxyl, a negative charge is generated, making an ionic bond during a recognition with a protein possible. The resulting d-glucuronic acid (GlcA) can undergo enzymatic epimerization (at C5) to l-iduronic acid (IdoA), both shown in the bottom row of Fig. 1A. A carboxyl group is also presented by N-acetylneuraminic acid, the parental compound for more than 50 natural derivatives called sialic acids [89, 100]. One of them, i.e. the N-glycol(o)yl, can be considered as the equivalent of 5-hydroxymethyl cytosine in the nucleotide alphabet. This case documents that the same principle, i.e. a hydroxylase processes a methyl group, works on more than one alphabet of life.

Oligo- and polysaccharides are produced from these symbols by forming the glycosidic bond. In this process, the incoming sugar that is activated at the anomeric center in α- or β-position can in principle be transferred to any hydroxyl group of the acceptor. The resulting linkage can then connect the C1’ (or C2’ for sialic acids) position of the donor with hydroxyl groups at C2, C3, C4 or C6, respectively, of the acceptor, as shown by the arrows in Fig. 1B. Instead of a single diglycoside (as is the case for dinucleotides or dipeptides), four products can be obtained; when considering the two anomeric positions of the donor, then a total of eight diglycosides can be obtained, each one defined by i) the order of symbols (carbohydrates), ii) the type of the anomeric position and iii) the pair of linkage points. By applying this nomenclature rule, the structural unit of the mentioned capsular polysaccharide, i.e. cellobiuronic acid, is not simply GlcA-Glc but GlcAβ1,4Glcβ1,3. At this point, of course, the question arises as to whether each hydroxyl group can really play a role as acceptor in the synthesis of cellular glycans.

By focusing on the β-anomer of Gal as instructive proof-of-principle example (placed in the figure’s center), Fig. 1B serves the purpose to present aspects of the wide range of physiological use of a carbohydrate letter when writing glycan-based messages: moving through this figure in clockwise direction, examples for involvement of each hydroxyl group of Gal in glycosyltransferase reactions are depicted. Obviously, the required diversity of the enzymes for glycan assembly has developed that exploits the described potential of carbohydrates (for an overview on enzymes of N- and mucin-type O-glycan synthesis, please see [6]). The illustration of the product panel in Fig. 1B of an exemplary case (please see also Fig. 1C for further cases of glycan structures) also explains why it is experimentally much more difficult to define oligosaccharides than to do so for nucleic acids and proteins but why, in turn, the coding capacity of glycans is unsurpassed, and Fig. 1C further deals with this aspect.

As mentioned above, a recurring theme of increasing the capacity to store information in molecular messages written with nucleotides and amino acids has been identified as adding distinct post-writing modifications, and it is found for carbohydrates, too. Intriguingly, the biochemical route of such an enzymatic pathway can be shared between alphabets of life, as has been indicated above for hydroxylation and is now illustrated for the case of sulfation in Fig. 1C. Following glycan synthesis, the transfer of a sulfate group from its shown donor (PAPS) is possible by sulfotransferases, other members of this enzyme family are known to carry out protein sulfation [16, 42, 71]. The survey of products of glycan sulfation given in Fig. 1C includes relevant parts of glycosaminoglycans, N- and O-glycans and a glycosphingolipid (galactocerebroside), demonstrating how a modification effectively widens the panel of sugar-encoded messages [7, 14, 40, 86]. The same holds true for other types of substitutions, for instance for 6-phosphorylated Man in N-glycans of glycoproteins destined to reach the lysosomes: it can be likened to the postal code for delivery of letters and packages [10, 55, 59]. This analogy graphically ties the flow of information to a reading (interaction) process, what takes us from examining glycans in two dimensions to now look at the shape. In addition to the already noted presentation of chemical groups in the hexopyranoses suited to establish molecular complementarity, the two rings of a diglycoside can rotate around the glycosidic linkage as hands can do when these thumbs touch: each constellation is defined by its Φ, Ψ-angles (from 180° to −180°). Inspection of the behavior of oligosaccharides in three dimensions has disclosed a favorable feature toward contact building.

Alluding to the famous lock-and-key principle that has been deduced from fundamental studies of enzymatic hydrolysis of glycosides [15], a high degree of intramolecular flexibility around glycosidic linkages would impede a recognition process, put an entropic barrier into its way. If, however, the conformational space of an oligomer is structured and has local (key-like) minima, then binding partners can find together much more easily, as the matching key glides into its lock. What requires the structural context of a protein for bringing a peptide into a distinct conformation, is often seen to come by its nature at the level of oligosaccharides, that is attaining few, energetically privileged conformers (for a physiologically relevant case of how such key-like structures of the same glycan look like, Fig. 1D presents the three conformers of the pentasaccharide of ganglioside GM1, each one in the right shape to associate a distinct receptor) [18, 116]. Literally using the phrasing of the lock-and-key paradigm, the conformers of an oligosaccharide such as the one shown in the center of Fig. 1D resemble a bunch of keys, “each of which can be selected by a receptor” [37].

In summary, the third alphabet of life makes writing of messages possible that are small and exceptionally diverse. Fittingly, the toolbox of the enzymatic assembly line, e.g. glycosyltransferases, is well-stocked. In order to prove that this potential of carbohydrates for coding is realized, the diversity of glycans has to be mapped in situ cyto- and histochemically. This has first been done rather qualitatively by classical chemical procedures like applying dyes or combining them with a chemical reaction (especially oxidation of cis-diols by periodate) [102]. The discovery of reagents, which perform blood-group ABH typing as reliable as antibodies do, paved the way to a more selective approach to characterize the glycome (glycophenotyping). Such proteins that therefore ‘select’ a member of this group cell surface epitopes and ‘read’ the (glycan-)encoded information were termed lectins, derived from the Latin verb legere [5] (for review of the history of lectinology, please see [50, 54]). The corresponding technique is thus called lectin cyto- and histochemistry. In the respective assays for blood-group typing, these plant lectins bridge (agglutinate) erythrocytes of the matching blood group. This activity explains the synonym ‘phytohaemagglutinin’ used for plant lectins.

V.  Glycophenotyping by Lectins

Lectins are currently defined as glycan-binding (glyco)proteins that are separated from the classes of carbohydrate-specific antibodies and enzymes (e.g. glycosyltransferases or glycosidases) as well as transporters for free mono- to oligosaccharides. Lectin classification is based on the sugar specificity and the folding of the carbohydrate binding domain (CRD). More than a dozen structural motifs are known (for a gallery of CRD structures found in plant lectins, please see [69]; for examples of plant lectins used for glycophenotyping and their specificity profiles, please see Table 1).

Table 1.  Glycan specificity of plant/fungal agglutinins used for histochemical glycophenotyping
Species Abbreviation Monosaccharide specificity Potent glycan ligands
Arachis hypogaea (peanut) PNA Gal Galβ3GalNAcα/β (in α-anomeric linkage to Ser/Thr known as core 1 disaccharide, Thomsen-Friedenreich antigen and CD176)
Erythrina cristagalli (coral tree) ECA Gal Galβ4(3)GlcNAc(type II > type I), affinity increase by bivalency of Manβ2/6(LacNAc)2
Lycopersicon esculentum (tomato) LEA a Core and stem regions of high-mannose-type N-glycans, LacNAc oligomers (GlcNAcβ3Galβ4GlcNAcβ3Gal)n present in complex-type N-glycans and core 2/4 mucin-type O-glycansb, affinity
increasing in proportion to chain lengthc
Phaseolus vulgaris (leukoagglutinin)
(kidney bean)
PHA-L a Tetra- and tri-antennary N-glycans with β6-branching
Maackia amurensis (leukoagglutinin) MAA-I a Neu5Acα3Galβ4Glc(NAc), 3'-sulfation tolerated
Polyporus squamosus (polypore mushroom) PSL a Neu5Acα6Galβ4(3)Glc(NAc) (over 300-fold more active than
LacNAc, not reactive with free Neu5Ac); 6'-sulfated LacNAc about
20-fold less active; sialyl Tn not active
Dolichos biflorus (horse gram) DBA GalNAc GalNAcα3GalNAcα3Galβ4Galβ4Glc (αGalNAc (Tn) comparatively weak), histo-blood group A-tetrasaccharide

aNo monosaccharide known as ligand; bdual reactivity documented in [53, 81]; c[109].

Technical protocols have been developed for their routine application to localize glycans [95]. These methods enable to map the topology of the steps of the glycan-producing machinery in the Golgi apparatus [84, 94], as they chart the profile of presence of their cognate glycans at any site in a cell or tissue. “Diversity of cell glycoconjugates shown histochemically” [107] also discloses the spatiotemporal regulation and dynamics of this type of cellular vocabulary. The illustrations in Fig. 2 (on an early bovine blastocyst and its presentation of glycoconjugates that contain αGalNAc residues) and in Fig. 3 (on sections of fixed embryonic (metanephric) kidney using six lectins) exemplarily document the non-random distribution of respective glycans listed in Table 1. Flanked by the results of biochemical glycan analysis in general [31, 78, 106] and also for the αGalNAc glycosylation in specific [8, 119], the application of lectins as tools in cyto- and histochemistry thus verifies the assumption that the third alphabet of life creates an unsurpassed diversity of compact messages.

Fig. 2.

Lectin histochemistry to visualize αGalNAc in an early bovine blastocyst (day 6) by using labeled DBA and serial optical sectioning. Three orthogonal planes show cell nuclei together with the F-actin cytoskeleton (A) and the staining pattern by DBA (B; BC: blastocoel; TB: trophoblast; ICM: inner cell mass; ZP: zona pellucida). Insets present a maximum intensity projection with all cell nuclei (A’) and DBA staining overlaid with a transmission image (B’). The boxed area is magnified in panels C–E (F-actin and DBA staining profiles are merged in D). DBA binding was detected in preferentially perinuclear vesicles or granules in both ICM (arrows in E) and TB, but neither at cell surfaces (arrowheads in D) nor at the ZP. Bars = 50 μm (10 μm in the enlarged area) (for details on the carbohydrate-binding specificity of DBA, please see Table 1; for technical details of staining protocols, confocal microscopy and image processing, please see [34]).

Fig. 3.

Lectin histochemistry on sections of fixed embryonic chicken kidney (metanephros; HH stage 40). Specific carbohydrate-inhibitable binding of labeled lectins is documented for PNA at glomeruli (gl), the basolateral part of proximal tubules (pt) and blood vessels (arrowhead) (A; inset: inhibition control with 200 mM Gal), for ECA at the apical part of pt (B, arrowheads), for LEA at distal tubules (dt; C), for PHA-L at the apical parts of pt (D), for MAA-I at low level signal intensity at this embryonic stage becoming stronger later in development (inset, HH stage 46) in collecting (white arrowhead) and distal tubules (E, inset (black arrowhead)) and for PSL in both dt and pt and also in blood vessels (arrowheads) (F). Bar = 50 μm (for details on specificity of lectins, please see Table 1; for details on staining protocol, please see [67, 70]).

Implicitly, the hypothesis that glycans are “multipurpose tools” in physiology [90] is giving research a clear direction, and by respective efforts “in recent years have we begun to appreciate how deeply glycan functions pervade all aspects of organismic biology, molecular biology, and biochemistry” [38]. In principle, drawing on the analogy of how messages encoded in nucleic acids and proteins are read, the specific association of the biochemical code signal with its receptor (reading) will be translated into post-binding responses. That an enriched fraction for a lectin of a bean (Phaseolus vulgaris), also called (phytohaem)agglutinin (PHA) and used for separating leukocytes from whole blood, “was found to be a specific initiator of mitotic activity” [80] and that “some of its [a plant lectin’s] biological activities are dependent upon its valence” [33] provided a guideline for the study of endogenous (tissue) lectins, from their structure to their function. In aggregate, the results of applying lectin cyto- and histochemistry illustrate the presence of one side of a glycan-lectin recognition system at an elaborate level. A search for binding partners of glycans, of note, can be carried out correspondingly.

VI.  Functional Glycan-Lectin Pairing

Cyto- and histochemically, glycans were used as sensor part of a labeled scaffold and the application of (neo)glycoconjugates proved instrumental to trace carbohydrate-binding sites (for examples of the glycohistochemical detection of lectins by labeled (neo)glycoconjugates, please see [22, 23, 25]). The biochemical characterization of tissue lectins revealed a structural variety up to what is seen in plant lectins (for a gallery of respective folds, please see [105]). Expansion and diversification of gene families not only concern the sequence of CRDs, hereby generating phylogenetically related receptors that can differ in selecting ligands (for an example of how the canonical contact site for glycans can diverge within a lectin family, please see [52, 93]). Notably, also the modular architecture of the lectins can be affected. In fact, the combination of the various types of domains in a lectin let the resulting multimodular (puzzle-like) sugar receptors serve many more purposes than those with a single CRD. For instance, the design of molecular tentacles by spacer elements in adhesion molecules or of multimeric aggregates by self-association modules (as molecular glue) for high-affinity association (docking) to serum glycoproteins in endocytosis or to foreign glycan signatures in innate immunity is crucial for the described functionality [17, 47].

The intimate correspondence between a glycan-encoded signal and its reader is graphically underscored by the sensing of the sugar Gal in its furanose (not pyranose) form that is present in microbes by a lectin of host defence, i.e. intelectin-1 [118]. The versatility of signal recognition does not only cover the capacity to distinguish between glycans. It is further underscored by revealing its association with dynamic glycan rewriting. Tightly regulated enzymatic removal of a sialic acid from a glycan of a glycolipid or glycoprotein can unmask a cryptic signal, at the right place and time. Upon cell activation or differentiation, the conversion of the hexasaccharide of the disialoganglioside GD1a into the GM1 pentasaccharide with one sialic acid (shown in the center of Fig. 1D) and its functional interplay with a tissue lectin (e.g. galectin-1) are emerging as molecular switch in autoimmunity, axonal repair and neuroblastoma growth regulation [11, 28, 49]. Illustrating the broadness of impact of this type of molecular pairing on cellular biology, an overview of the range of functions of glycan-lectin interplay is given in Table 2. Obviously, the vocabulary of glycan-encoded words is in the early stage of the process of being compiled into a dictionary, which gives a translation into functionality (by lectin binding). For preparing such a dictionary, to know details about lectin expression in cells and tissues will be enormously important so that the next part will focus on the contribution of cyto- and histochemistry with endogenous lectins to glycosciences.

Table 2.  Functions of animal and human lectinsa
Activity Example of lectin
Recognition of stem region of N-glycans, a signal for ubiquitin conjugation when accessible in incorrectly folded glycoproteins F-box proteins Fbs1 and Fbs2, which comprise the ligand-specific part of SCFb ubiquitin ligase complexes
Molecular chaperones with dual specificity for Glc2/Glc1Man9GlcNAc2 and protein part of nascent glycoproteins in the ER malectin/ribophorin I complex, calnexin, calreticulin
Targeting of misfolded glycoproteins with Man8-5GlcNAc2 as carbohydrate ligand to ER-associated degradation (ERAD) EDEM1,2c/Mnl1 (Htm1) (α1,2-mannosidases; in mammals active as lectins), Yos9p (MRHd domain) in yeast, erlectin (XTP3-Be) and OS-9f in mammals
Intracellular transport of glycoproteins and vesicles, e.g. in apical, axonal and lysosomal routing comitin, ERGIC53g and VIP36h (probably also ERGLi and VIPLj), galectins-3, -4 and -9, P-type lectins
Intracellular transport and extracellular assembly non-integrin 67 kDa elastin/laminin-binding protein
Enamel formation and biomineralization amelogenin
Inducer of membrane superimposition and zippering (formation of
Birbeck granules)
langerin (CD207)
Molecular glue for high-order (glyco)protein association (packing) eye lens-specific galectin (GRIFIN)
Cell type-specific endocytosis cysteine-rich domain (β-trefoil) of the dimeric form of mannose receptor for GalNAc-4-SO4-bearing glycoprotein hormones in hepatic endothelial cells, dendritic cell and macrophage C-type lectins (mannose receptor family members (tandem-repeat type) and single-CRDk lectins such as trimeric langerin/CD207 or tetrameric DC-SIGNl/CD209), hepatic and macrophage asialoglycoprotein receptors, HAREm, P-type lectins
Recognition of foreign glycans (β1,3-glucans, β1,2-mannosides, cell
wall peptidoglycan, LOSn and LPSo), mycobacterial glycolipid or
host-like epitopes
CR3p (CD11b/CD18, Mac-1 antigen), C-type lectins such as collectins, DC-SIGN, dectin-1, Mincle and RegIIIγ (murine)q or HIP/PAP (human), ficolins, galectins, immulectins, intelectins, Limulus coagulation factors C and G, siglecs, tachylectins
Recognition of foreign or aberrant glycosignatures on cells (including
endocytosis or initiation of opsonization or complement activation) and of apoptotic/necrotic cells (glycans or peptide motifs)
collectins, C-type macrophage and dendritic cell lectins, CR3 (CD11b/CD18, Mac-1 antigen), α/θ defensins, ficolins, galectins, pentraxins (CRP, limulin), RegIIIγ (HIP/PAP), siglecs, tachylectins
Targeting of enzymatic activity in multimodular proteins acrosin, Limulus coagulation factor C, laforin, β-trefoil fold ((QxW)3
domain) of GalNAc-Tsr involved in mucin-type O-glycosylation—often found in microbial glycosylhydrolases for plant cell wall polysaccharides, then termed carbohydrate-binding modules
Induction or suppression of effector release (H2O2, cytokines etc.) chitinase-like YKL-40, galectins, I-type lectins (e.g. CD33 (siglec-3), siglecs-7 and -9), selectins and other C-type lectins such as CD23, BDCA2 and dectin-1, Toll-like receptor 4
Modulation of enzymatic activities in modular proteins/receptor
endocytosis via lattice formation
mannan-binding lectin (acting on meprins), galectins
Coordination of autophagy/repair mechanisms by sensing endomembrane damage and ‘calling for help’ galectins-3, -8 and -9
Cell growth control, induction of apoptosis/anoikis and axonal
regeneration, anti- and pro-inflammatory regulation with/without
modulating gene expression
amphoterin and other heparin-binding proteins, cerebellar soluble lectin, chitinase-like lectins, C-type lectins, galectins, hyaluronic acid-binding
proteins, siglecs (e.g. CD22 and CD33)
Cell migration and routing galectins, hyaluronic acid-binding proteins (CD44, hyalectans/lecticans, RHAMMs), I-type lectins, selectins and other C-type lectins
Cell–cell interactions galectins, gliolectin, I-type lectins (e.g. siglecs, N-CAMt, P0 or L1), selectins and other C-type lectins such as DC-SIGN or macrophage
mannose receptor
Cell–matrix interactions calreticulin, discoidin I, galectins, heparin- and hyaluronic acid-binding lectins including hyalectans/lecticans
Matrix network assembly galectins (e.g. galectin-3/hensin), non-integrin 67 kDa elastin/laminin-binding protein, proteoglycan core proteins (C-type CRD and G1 domain of hyalectans/lecticans)

aAdapted from [68], extended and modified; bSkp-1-Cul1-F-box protein complex; cER degradation enhancing α-mannosidase-like protein; dmannose-6-phosphate receptor homology; eXTP3-transactivated gene B precursor; fosteosarcoma 9; gER-Golgi intermediate compartment protein (lectin) (MW: 53 kDa); hvesicular-integral (membrane) protein (lectin) (MW: 36 kDa); iERGIC-53-like protein; jVIP-36-like protein; kcarbohydrate recognition domain; ldendritic cell-specific ICAM-3-grabbing nonintegrin; mhyaluronan receptor for endocytosis; nlipooligosaccharide; olipopolysaccharide; pcomplement receptor type 3; qmember of regenerating (reg) gene family of secreted proteins; rUDP-GalNAc: polypeptide N-acetylgalactosaminyltransferases; sreceptor for hyaluronan-mediated motility; tneural cell adhesion molecule.

VII.  Endogenous Lectins in Cyto- and Histochemistry

By working hand in hand, scouring genomes for lectin genes and characterizing their products are filling the remaining gaps in the lists of lectin families and their members, setting the stage for a comprehensive immunocyto- and -histochemical mapping. Advisedly, the apparent occurrence of a CRD not in a single but in various proteins of a lectin family lets network analysis appear reasonable, that is the detailed comparative analysis of expression of each protein. As graphically illustrated in Fig. 4 for a human tumor cell line and in Fig. 5 in tissue sections, members of a lectin family, here tested in the case of the adhesion/growth-regulatory galectins, are not uniformly expressed but appear to display individual localization profiles with overlaps and marked differences.

Fig. 4.

Galectin immunocytochemistry on fixed and permeabilized human colorectal adenocarcinoma cells (Caco-2) stained in 2-step procedures with concanavalin A (red) to present cell morphology (A, C, D, F) and specific anti-galectin antibody (green; B, C, E, F), revealing intracellular presence of the galectins-3 (A–C) and -8 (D–F) at different levels of expression. Nuclei were stained with DAPI (blue; A, C, D, F). Bar = 75 μm (for details on detection of galectin expression in cell lines by RT-PCR analysis, please see [62]).

Fig. 5.

Galectin immunohistochemistry on sections of fixed adult chicken cornea of eye and chicken kidney using specific anti-chicken galectin (CG) antibodies. By light microscopy, CG-3 and CG-1A (inset) were found in the corneal epithelium (asterisks in A and inset to A), CG-1B strongly in fibroblasts (arrows) and fibrocytes (arrowheads) in the stroma (st) (B, asterisks mark the negative corneal epithelium). Positive (right inset to B: strong positivity in central lens fiber cells, white arrowhead) and negative (left inset to B: cornea) controls with ocular lens-specific chicken galectin-related inter-fiber protein (C-GRIFIN) exclude antigen-independent binding. By two-color fluorescence microscopy (green, red), differences and overlap of expression profiles in embryonic chicken kidney (at HH stage 26, metanephros) were detected in pairwise comparison (C, D) in the epithelial lining of collecting ducts (cd), collecting tubules (ct) and proximal tubules (pt). Bars = 20 μm (for details on reagents, staining protocols and technical aspects, please see [26, 27, 69, 70]).

The validity of this conclusion has further been solidified by comprehensive examination of expression profiles of galectins comprehensively during the course of differentiation from progenitor cells of ectodermal origin to the mature eye lens (Table 3) and the respective mapping for six galectins in organs of mice [79]. These insights give reason to suggest the fundamental concept of a functional interplay between lectins in situ that spans from additive cooperation (as seen for galectins-1, -3 and -8 in the pathogenesis of osteoarthritis [114, 117]) to strict antagonism (as seen for galectins-1 and -3 in neuroblastoma and pancreatic carcinoma growth regulation in vitro [58, 98]). The mentioned glycan remodeling can result in a teamwork between members of the same lectin group (galectin-8 as GD1a receptor, galectins-1, -2, -3 and -7 as GM1 receptors) or of more than one lectin family that will sense the removal of sialic acid, in this case an axonal siglec (siglec-4, also called myelin-associated glycoprotein [101]) as GD1a receptor and a galectin as GM1 receptor: one should be therefore open to further broaden the scope of analysis accordingly.

Table 3.  Immunohistochemical profiling of CG presence during embryogenesis of eye lens and two post-hatch ages
Stageb/cell type CG
CG-1A CG-1B CG-2a CG-3 CG-8 C-GRP C-GRIFINa
HH stage 19 (ca. 68–72 h)
 anterior epithelium of the lens vesiclec ++ −/(+) +/++++h (+)/+ −/(+) +++ (+)
 posterior part of the lens vesicle ++ −/(+) + − → + −/(+) +++ +/++
 presumptive annular padd +/++ −/(+) +/++++h (+) −/(+) +++ (+)
HH stage 23 (ca. 3.5–4 days)
 anterior epithelium of the lens vesiclec ++ −/(+) +/+++h (+)/+ −/(+) +++
 primary lens fiber cells +/++ −/(+) + (+)/++ −/(+) +++ +++
 presumptive annular padd +/++ −/(+) +/++h −/+ −/(+) +++
HH stage 31 (ca. 7 days)
 central anterior epithelial cells ++/+++ +/+++h ++/+++ +/++ +++
 equatorial epithelial cellse ++/+++ +/+++h ++/+++ +/++ +++
 nucleated fiber cellsf +/++ −/(+) +/++ ++/+++ + ++ ++
 cortical lens fiber cells ++/+++ −/(+) + +/++ + +++ ++++
 nuclear lens fiber cells ++/+++ −/(+) ++ + + ++ +++
HH stage 39 (ca. 13 days)
 capsule
 central anterior epithelial cells +++ ++/+++h ++++ ++ +++ −/+
 equatorial epithelial cellse +++ ++/+++h ++++ +/++ +++ −/+
 nucleated lens fiber cellsf +/++ −/(+) ++ ++++ + ++ ++
 cortical lens fiber cells ++/+++ −/(+) + + −/+ +++ ++++
 nuclear lens fiber cells ++/+++ −/(+) ++ (+) −/+ +/++ +
3-month-old eye (adult)g
 capsule
 central epithelial cells ++++ ++ ++
 equatorial epithelial cellse ++++ ++ ++
 nucleated fiber cellsf + (+) ++++ + +
 cortical lens fiber cells ++ (+) ++++ +
 nuclear lens fiber cells ++ +→+++ + +→+++

a From [26, 27]; b according to the Hamburger and Hamilton classification [35, 36]; c precursor of central epithelial cells; d precursor of equatorial epithelial cells; e annular pad; f transitional zone; g from [69]; h supranuclear; signal intensity was semiquantitatively grouped into the following categories: −: no staining; (+): very weak but significant staining; +: weak staining; ++: moderate staining; +++: strong staining; ++++: very strong staining; from [29], with permission, and shortened to present data at four instead of eight HH stages.

Moreover, localization profiles are eminently precious sources of information on unsuspected skills of lectins. The cytoplasmic presence of galectins, illustrated in Fig. 4 and in Fig. 5, is a prerequisite for their activity to sense damage of intracellular membranes and act as initiator for triggering autophagy or repair [41]. Non-classical secretion lets galectins enter the extracellular space, opening the way to becoming effectors on the cell surface and the extracellular matrix [43, 61, 99]. Similarly, intracellular glycoprotein transport and delivery guided by glycans as postal codes mentioned above call for lectins to meet their cargo inside cells. What the interaction of the Man-6-phosphate signal with the P-type lectins is for lysosomal enzymes is true for the binding of the sulfatide headgroup shown in Fig. 1C (bottom) and N-glycans with N-acetyllactosamine (LacNAc) clusters to galectin-4 in apical and axonal glycoprotein routing [108, 115]. Writing about LacNAc (and its appearance in polyLacNAc repeats in N- and O-glycans and in keratan sulfate) as binding partner of galectins, especially galectins-1 and -9, brings us to an obvious extension of (plant) lectin cyto- and histochemistry. With these tissue lectins in hand, cytochemical localization of galectin ligand(s) is possible, as documented in Fig. 6. Labeled tissue lectins thus facilitate studies on segregating the glyco­phenotype into glycan(s) that bind an endogenous receptor, and the cell and tissue staining patterns identify glycans with functional potential by this specific recognition.

Fig. 6.

Galectin cytochemistry to visualize binding sites for human galectin-1 (Gal-1) and the N-terminal CRD of human galectin-9 (Gal-9N) in bovine preimplantation embryos by serial optical sectioning. Panels A–D present images of a late 4-cell embryo. A maximum intensity projection shows the cell nuclei together with the F-actin cytoskeleton (A). Gal-1 binding is demonstrated in a single optical section together with DNA and actin filaments (B) and alone without (C) and with gamma correction (C’). The boxed area in B is enlarged and presented in panel D. Gal-1 labeled evenly distributed cytoplasmic granules or vesicles (arrows) and delineated the cell surface (arrowheads). The bottom part of the figure (E–I) presents a hatching blastocyst (at day 8). Cell nuclei and F-actin cytoskeleton are shown by a maximum intensity projection (E). Binding of Gal-9N is demonstrated on single optical sections through the ICM (F, H) and the hatching part of the embryo (I), and this also by a complete (G) and a slightly rotated partial maximum intensity projection (G’) presenting the opening in the zona pellucida (arrow in G’). The ICM and the hatching part of the embryo are shown at higher magnification in panels H and I (gamma-corrected overlay and single-channel images). Gal-9N intensely stained the ZP. In the non-hatched part of the blastocyst, only some very weak signals were detected in the cytoplasm (arrows in H). In contrast, the already hatched trophoblast cells showed intense staining of cytoplasmic vesicles (arrows in I) and of the outer cell surface (arrowheads in I). Bars = 50 μm (20 μm in the enlarged area) (for explanation of abbreviations, please see legend to Fig. 2; for details on preparation and glycan-binding properties of these two galectins, especially to polyLacNAc chains, please see [24, 44, 73]).

In principle, a tissue section can be considered as a platform that presents cellular glycoconjugates in their natural diversity and topology by covering glycan branching and clustering. Lectin cyto- and histochemistry with tissue lectins will answer pertinent questions on how binding profiles look like and are regulated. As shown in Fig. 7, members of the galectin family can interact differently with cellular glycan presentation, when studied alone or in combination. The routes of diversification from an ancestral CRD to a family in terms of structure of its contact site and of modular architecture thus appear to lead to a toolbox of effectors, a broad area to be explored. In addition to the systematic testing of lectins within families, the design of variants concerning the modular design (termed lectinology 4.0 [66]) will provide the probes to discern how the protein architecture (i.e. the same CRD presented in different structural forms, e.g. di- or tetramer) affects staining profiles (and functions). Respective studies have recently been initiated for galectins [30, 60].

Fig. 7.

Galectin histochemistry on sections of fixed chicken kidney (A, B; mesonephros; HH stage 35) and bone (C, D: HH stage 35 and adult). By light microscopy, specific carbohydrate-inhibitable binding (inset to A; 200 mM lactose) and differences of staining profiles (documented for distal tubules (dt), glomeruli (gl) and proximal tubules (pt)) for two chicken galectins, i.e. CG-1A and CG-2, were obtained (A, B). By two-color fluorescence microscopy (green, red), signal overlap in chondrocytes (arrows) and distinct staining by a single galectin, i.e. CG-3, in the osteoid layer (arrowheads), were observed (C). In adult bone, inner/outer zones of periosteum were labeled differently by the two chicken galectins (arrowheads), whereas signal overlap occurred in the bone matrix of trabeculae (arrow) (D). Bar = 50 μm (for details on reagents, staining protocol and technical aspects, please see [48, 70]).

VIII.  Conclusions

The concept to consider nucleotides and amino acids as the first two alphabets of life that are the biochemical letters to store biological information in oligo- and polymers has led us to regard carbohydrates as the third system of symbols. Oligo- and polysaccharides are ubiquitous in Nature, and the special chemical properties of sugars, together with the sophisticated synthetic machinery for glycans, make an unsurpassed extent of structural diversity at the level of oligomers possible. As consequence, cells and tissues are teeming with glycan-encoded information and potential for interactions with lectins. Fittingly, more than a dozen families of readers of these compact code words (called lectins) have evolved. Studied first as laboratory tools for glycophenotyping, lectins from plants proved to be a great asset to verify the hypothesis for broad-scale diversity and dynamic spatiotemporal regulation of glycans.

The growing understanding of how well-stocked the toolbox of lectins in vertebrates is makes the need for systematical mapping expression profiles immunocyto- and histochemically obvious. The emerging correlations between lectin expression and cellular processes will inspire functionally oriented studies, using cell biological test systems up to tissue cultures and animal models in healthy and diseased states. Synthetic surface programming of vesicle-like nanoparticles complements this toolbox by allowing to rationally manipulate glycan structures and their local density for testing [20, 76]. In parallel, the application of labeled endogenous lectins, in analogy to plant lectin cyto- and histochemistry, can enormously help to give distinct aspects of the glycophenotype a physiological meaning. An example, i.e. the role of sulfatide with its 3-O-sulfated Gal headgroup (please see Fig. 1C (bottom) for structure) in galectin-4-mediated axonal glycoprotein (L1) routing, is shown in Fig. 8; the cause of mucolipidosis II (I-cell disease), i.e. impaired generation of the mentioned Man-6-phosphate in N-glycans for P-type lectin-mediated routing of lysosomal glycoproteins, highlights the connection of a glycan to a disease [59]. Indeed, “evidence clearly indicates that glycans represent a largely untapped resource for biological discovery as well as unanticipated therapeutic opportunities” [1]. In this sense, the vocabulary of glycan structure (cellular glycome) is intended to become assembled in a dictionary (with the corresponding functional aspect(s) as translation), hereby cracking the sugar code.

Fig. 8.

Impact of reduction of sulfatide presence on axonal clustering of the NCAM-L1 glycoprotein by impairing its galectin-4-mediated routing. Treatment of rat embryonic hippocampal neurons with 75 mM sodium chlorate, an inhibitor of ATP-sulfurylase that carries out the first step of 3'-phosphoadenosine-5'-phosphosulfate (PAPS) synthesis (the sulfate donor for generating sulfatide; please see Fig. 1C), leads to significant decrease of number of L1 clusters (red) and their area (please compare clusters indicated by arrowheads in insets of A, B). Neurons were immunostained for α-tubulin (green) to visualize the cell structure. Bar = 25 μm (A: control, B: treatment; for details, please see [115]).

IX.  Conflicts of Interest

The authors declare that there are no conflicts of interest.

X.  Acknowledgments

We wish to express our sincere appreciation for the thorough review process and the valuable comments of the reviewers.

XI. References
 
© 2021 The Japan Society of Histochemistry and Cytochemistry

This is an open access article distributed under the Creative Commons License (CC-BY-NC), which permits use, distribution and reproduction of the articles in any medium provided that the original work is properly cited and is not used for commercial purposes.
https://creativecommons.org/licenses/by-nc/4.0/
feedback
Top