2025 年 100 巻 論文ID: 24-00143
Nucleosomes are complexes of DNA and histone proteins that form the basis of eukaryotic chromatin. Eukaryotic histones are descended from archaeal homologs; however, how this occurred remains unclear. Our previous genetic analysis of the budding yeast nucleosome identified 26 histone residues conserved between Saccharomyces cerevisiae and Trypanosoma brucei: 15 that are lethal when mutated and 11 that are synthetically lethal with deletion of the FEN1 nuclease. These residues are partially conserved in nucleosomes of a variety of giant viruses, allowing us to follow the route by which they were established in the LECA (last eukaryotic common ancestor). We analyzed yeast nucleosome genetic data to generate a model for the emergence of the eukaryotic nucleosome. In our model, histone H2B-H2A and H4-H3 doublets found in giant virus nucleosomes facilitated the formation of the acidic patch surface and nucleosome entry sites of the eukaryotic nucleosome, respectively. Splitting of the H2B-H2A doublet resulted in the H2A variant H2A.Z, and subsequent splitting of the H4-H3 doublet led to a eukaryote-specific domain required for chromatin binding of H2A.Z. We propose that the LECA emerged when the newly split H3 N-terminus horizontally acquired a common N-tail found in extinct pre-LECA lineages and some extant giant viruses. This hypothesis predicts that the emergence of the H3 variant CENP-A and the establishment of CENP-A-dependent chromosome segregation occurred after the emergence of the LECA, implying that the root of all eukaryotes is assigned within Euglenida
Histones are the most highly conserved eukaryotic proteins, suggesting that the histones of the last eukaryotic common ancestor (LECA) should be almost identical to those found in extant eukaryotes. In extant eukaryotes, the nucleosome is composed of a histone octamer (comprising two H2A/H2B dimers and a [H3/H4]2 tetramer) that wraps 146 bp of DNA (Luger et al., 1997). Although evidence suggests that eukaryotic histones are descended from archaeal histones (Mattiroli et al., 2017), how the simpler archaeal nucleosome transformed into the eukaryotic nucleosome remains unclear. Some giant viruses have a variety of histones with variable configurations including singlet, doublet, triplet or quadruplet (H2A, H2B, H3, H4, H2B-H2A, H2A-H2B, H4-H3, H3-H4, H2B-H2A-H3 or H2B-H2A-H3-H4) (Talbert et al., 2022; Irwin and Richards, 2024). In particular, giant virus nucleosomes that contain two H2B-H2A doublets and two H4-H3 doublets, and are thus remarkably similar to the eukaryotic nucleosome, have been observed (Liu et al., 2021; Valencia-Sánchez et al., 2021), suggesting that the eukaryotic nucleosome evolved alongside those of giant viruses. However, the evolutionary path of the eukaryotic nucleosome from (or to) the giant virus nucleosome remains unclear.
While providing important insights into the likely origin of the eukaryotic nucleosome, previous evolutionary work on histone biology and the LECA has raised several further questions. First, the phylogenetic root of all eukaryotic lineages (LECA) is still debated (Gabaldón, 2021). Furthermore, Trypanosoma brucei does not have a CENP-A-dependent chromosome segregation system (Akiyoshi and Gull, 2014; Tromer et al., 2021). Thus, either the LECA had a CENP-A-dependent chromosome segregation system and T. brucei lost it or the LECA had no CENP-A system and the LECA gained it. It is not yet clear which of these two possibilities is correct. Histones possess an unstructured N-terminus that protrudes from the nucleosome and is often subject to posttranslational modification that regulates the degree of compaction of chromatin; however, in T. brucei and Giardia lamblia the amino acid sequences of the H3 N-tails are poorly conserved compared with those in other eukaryotes, including Saccharomyces cerevisiae (Postberg et al., 2010). The reason for this difference remains elusive.
Another interesting question is the origin of the H2A variant H2A.Z, as the G. lamblia genome does not harbor this variant (Talbert et al., 2019), suggesting either that the LECA had H2A.Z and G. lamblia lost it or that the LECA had no H2A.Z and subsequently gained it. Again, it is not yet clear which of these two possibilities is correct. Finally, it is unclear whether the H2B-H2A and H4-H3 doublets found in giant viruses are the ancestor or the descendant of the extant H2A, H2B and H3, H4 singlets (Talbert et al., 2022). Although phylogenetic metagenome analyses of histone genes imply that the H2B-H2A and H4-H3 doublets are ancestors of the eukaryotic nucleosome (Irwin and Richards, 2024), it remains elusive how H2B-H2A and H4-H3 are mechanistically transformed into separate histone singlets.
Using yeast genetics, we have previously identified 15 histone residues that are lethal when mutated (Sakamoto et al., 2009) and 11 that are synthetically lethal in combination with the loss of the Okazaki fragment-processing enzyme FEN1 (Nakabayashi and Seki, 2023). We define here a ‘functional residue’ as a residue whose mutation causes a change in phenotype. All of the above 26 functional histone residues are perfectly conserved between S. cerevisiae and T. brucei (Nakabayashi and Seki, 2023) and thus can be used as an evolutionary searchlight to address the above six questions. We performed a systematic analysis of the existing yeast genetics data and present a detailed hypothesis for the evolution of the nucleosome from archaea to extant eukaryotes.
Analysis of the crystal structure of three histone B homodimers from Methanothermus fervidus (HMfB) bound to 90 bp of DNA has provided valuable insights into the structure of the archaeal nucleosome (Fig. 1A(a)) (Mattiroli et al., 2017). This analysis, combined with the solved solution structure of a human H2A/H2B heterodimer (Fig. 1A(b)) (Moriwaki et al., 2016), demonstrated that a histone fold (HF) is highly conserved and commonly found in both the archaeal nucleosome and the human H2A/H2B heterodimer (Fig. 1A(c)). However, major structural differences between Figure 1A(a) and (b) can be found in the regions flanking the HF. The H2A αN, H2A βC and docking domain, all found in the human nucleosome, are disordered (Fig. 1A(c)). Moreover, the L3 region of H2B is flexible, leading to H2B αC rotation (Fig. 1A(c)).
Since H2B-H2A and H2A-H2B histone doublets are found in some giant viruses (Talbert et al., 2022), we theoretically fused human H2A and H2B as shown in Figure 1B(a) and (b). There is good evidence that these histone doublets would be more structurally stable than the H2A/H2B heterodimer. Indeed, exploitation of the structural stability of the artificially fused H2B-H2A doublet and H2B-H2A.Z (H2A variant) doublet has been used several times to facilitate structural analyses of these proteins (Fig. 1B(c)–(e)). Moreover, both H2B-H2A and H2B-H2A variant doublets are functional in budding yeast, chicken and human cells (Fig. 1C(f)) (Nakabayashi et al., 2014, 2020; Ruiz and Gamble, 2018; Kitagawa et al., 2021), while a H2B-H2A doublet rescues the lethality of a double gene deletion mutant of H2A and H2B (Fig. 1C(f)1) (Nakabayashi et al., 2014). Thus, we suggest that the H2B-H2A doublet is more stable than the H2A/H2B heterodimer (Fig. 1C(a)).
Structurally, eukaryotic nucleosomes (Fig. 1C(b)) (Luger et al., 1997) are very similar to giant virus nucleosomes (Fig. 1C(c)(d)) (Liu et al., 2021; Valencia-Sánchez et al., 2021). Although it has not yet been determined whether H2B-H2A is the ancestor (Fig. 1C(e)) or the descendant (Fig. 1C(f)), these analyses suggest that the former is more likely. Moreover, a recent systematic phylogenetic analysis of 258 histones of 168 giant viral metagenomes revealed that viral histone doublets originated in stem eukaryotes and that nucleosome evolution proceeded through histone doublet intermediates (Irwin and Richards, 2024). Thus, we tentatively subjected the ‘H2B-H2A ancestor hypothesis’ to further consideration (Fig. 1C(e)).
Evolution of the eukaryotic acidic patch surface through the H2B-H2A doubletTo explore the H2B-H2A ancestor hypothesis, we followed the evolutionary path from archaeal to eukaryotic histones. Analyses demonstrate that some archaea have multiple histone genes (Nishida and Oshima, 2017) and that archaeal histones sometimes have both N- and C-tails on the outside of the HF (Mattiroli et al., 2017). Histone doublets are also found in some archaea (Talbert et al., 2019). Taken together, the evidence suggests that prototypes of the extant histones H2A, H2B, H3 and H4 existed in some archaea, followed later by proto-H2A and proto-H2B, both of which have long N- and C-tails, fusing to give a H2B-H2A doublet (Fig. 2A ①). The long linker region between H2B and H2A would allow the formation of two α-helixes, H2B αC and H2A αN. Notably, the rotational ability of the extant eukaryotic H2B αC (Fig. 1A(c)) is fixed by a mutual interaction between H2B αC and H2A αN in the nucleosome. The surface created by the interaction between H2A α2 and H2B αC could have formed a primitive landing pad for primitive nucleosome-binding proteins (Fig. 2B ②③).
It is possible that the H2B-H2A doublet had an ancient long, disordered C-terminus with a downstream region adjacent to the H2A HF that would be able to interact with both H2A α2 and H2B αC, leading to H2A αC and a primitive acidic patch on the ancient nucleosome (Fig. 2A ④). Of the 26 conserved histone residues we have previously identified (Nakabayashi and Seki, 2023), seven are found on the acidic patch (Fig. 2B). Since the acidic patch of extant nucleosomes is an excellent landing pad for half of all nucleosome-interacting factors (Skrajna et al., 2020), it is likely that a strong Darwinian driving force would have facilitated mutual interaction between H2A α3 and H2A αC (Fig. 2A ⑤). Thus, the H2B-H2A doublet, but not a H2A/H2B heterodimer or a H2A-H2B doublet, would form a stable acidic patch (Fig. 2C), strongly supporting the H2B-H2A ancestor hypothesis. In this scenario, once the progenitor of the acidic patch was established, co-evolution between the acidic patch and its interactors would have occurred independently in a variety of pre-LECA lineages, reflected in the great variation of residues found in different giant viruses (Fig. 2D ⑥).
The H4-H3 doublet may be the ancestor of extant histonesWe next asked whether the H4-H3 doublet is also likely to be an ancestor of the extant heterodimer by further examining the theoretical stability of these complexes. Either absence of a H2A/H2B heterodimer (PDB: 7X57, 2IO5, 5BS7) or lack of interaction between the H3 αN and the H2A' docking domain (PDB: 6M4G) leads to disordering of the H3 αN (Fig. 3A(a)), which usually forms an α-helix in the nucleosome. Moreover, the H4 βC, which interacts with the H2A' βC in the nucleosome and forms a short β-sheet, is disordered in the absence of the H2A/H2B dimer (PDB: 7X57, 5BS7) (Fig. 3A(a)). Notably, in the two available structures of the giant virus nucleosome, there is no interaction between the H2A' docking domain and the H3 αN (PDB: 7LV8, 7N8N). Thus, we predict that in the giant virus nucleosome, the H3 αN is disordered when H4-H3 is artificially split into H4 and H3 monomers (Fig. 3A(b)).
Theoretical fusion of H3 and H4 monomers led to the formation of H4-H3 and H3-H4 doublets (Fig. 3A(c)(d)). Given that H3 αN and H4 βC are only structurally stable in the H4-H3 doublet (Fig. 3B), we predict that H4-H3 is the ancestor of the extant eukaryotic H3/H4 heterodimer. Besides histone residues present in HRD-I (homologous recombination domain I) (Nakabayashi and Seki, 2023) (Fig. 2B), HRD-II (Fig. 3C(a)), HRD-III (Fig. 3C(b)) and HRD-IV (Fig. 3C(c)), related residues are shown. Among those residues, four conserved residues (H3-L48, -I51, -F54 and -Q55) on H3 αN (Fig. 3C(a)) and one (H4-Y98) on H4 βC (Fig. 3C(c)) could be established only in the H4-H3 doublet (Fig. 3A(c)). Thus, it seems likely that the H4-H3 doublet is the ancestor (Fig. 3A(e)) rather than the descendant (Fig. 3A(f)).
Evidence for horizontal transfer of the H3 N-terminal region
Eighteen of the conserved residues (including the five residues discussed above) previously identified on H3 or H4 (Nakabayashi and Seki, 2023) are perfectly conserved in Phycodnavirus H3 or Bracovirus H4 (Supplementary Fig. S1). Thus, these viral histones may have been derived from the extant eukaryote. By contrast, these 18 residues show mosaic conservation in a variety of giant virus histones (Supplementary Fig. S1), indicating that such giant viruses were derived from pre-LECA lineages (Irwin and Richards, 2024). Since three of the histone residues that are lethal when mutated (H3-L48, -I51, -Q55) are localized on the disordered H3 αN (Fig. 3A(a)), we next compared the N-terminal region (including H3 αN) just upstream of the H3 HF in a variety of giant virus H3 histones (Fig. 4).
Of the identified histone residues that are lethal when mutated in S. cerevisiae, three (L/I/V48, I/V51 and Q55) are also present in seven giant viruses (Fig. 4). Since, in yeast, these three residues, together with -R52, are involved in the interaction between the H3 αN and the H2A' docking domain (Supplementary Fig. S2), this interaction may have co-evolved among pre-LECA lineages and giant viruses. In nucleosomes of Marseillevirus and Melbournevirus, which do not have such an interaction (Fig. 3A(b)), only one residue, H3-V51 (corresponding to H3-I51 of S. cerevisiae), is conserved. Splitting of the H4-H3 doublet in the absence of interaction between the H3 αN and the H2A' docking domain is likely to lead to disordering of the H3 αN (Fig. 3A(b)). Since both Medusavirus and its derivative Medusavirus stheno have free H3 N-termini, we predict that in the nucleosomes of these viruses, interaction would occur between the H3 αN and the H2A' docking domain. Given that only one (H3-Q55) out of the three lethal residues is found in Medusavirus and Medusavirus stheno, it seems likely that the lineages of these viruses are very distant from pre-LECA lineages.
Surprisingly, the H39–A47 residues of histone H3, which are just upstream of the lethal H3-L48 residue when mutated, in S. cerevisiae are also conserved in Medusavirus and Medusavirus stheno, as well as in five other giant viruses (Fig. 4). The predicted H3 N-tail from virus sequences and S. cerevisiae is similar to the previously reconstructed H3 N-tail using a variety of extant eukaryotes (Postberg et al., 2010). Notably, the H3 N-tail of S. cerevisiae, but not those of giant viruses, contains a GGK sequence (Fig. 4).
Comparison of nucleosome structures between the budding yeast canonical nucleosome and the human CENP-A-containing nucleosome demonstrates a clear structural boundary between H3-A47 and -L48 (Fig. 5A). An interaction between the H3 αN and the H2A' docking domain is a prerequisite for maintaining the structural and functional integrity of the H3 αN during splitting of the H4-H3 doublet (Fig. 5B(a)). Following splitting of the H4-H3 doublet, the newly formed H3 αN would be structurally identical to the extant CENP-A αN (Fig. 5B(b)). Since some residues in the linker region between H4 and H3 in the doublet of Marseillevirus and Melbournevirus interact with nucleosomal DNA (Fig. 4) (PDB: 7LV8, 7N8N), it is possible that the newly formed H3 αN would be more stable (Fig. 5B(c)).
Although conservation of the 11 H3 residues (Supplementary Fig. S1) is poor and less than perfect in Medusavirus (55%) and Clandestinovirus (91%), respectively, there appears to be good conservation between the H3 N-terminal sequences of these viruses and those of budding yeast (Fig. 4). One explanation for this is that the common evolutionarily beneficial H3 N-terminal sequence always recombined into an emerging free H3 following splitting of the H4-H3 doublet (Fig. 5C). Since co-evolution of pre-LECA cells and giant viruses by frequent horizontal transfer of histone genes during eukaryogenesis is predicted (Irwin and Richards, 2024), we imagine that homologous recombination occurred between histone genes of pre-LECA lineages and those of giant viruses (Fig. 5C). Such recombination enables a one-turn extension of the α-helix of the H3 αN and tethering of the H3 αN onto nucleosomal DNA (Fig. 5C, right purple box). Alongside their structural benefits, residues H39–A47 of histone H3 are highly functional in budding yeast (Fig. 4, Supplementary Figs. S3A(a)–(c) and S3B [H3 sequence]) (Sakamoto et al., 2009). Thus, we posit that a strong Darwinian driving force would have allowed for horizontal recombination of the same H3 N-terminus repeatedly during nucleosome evolution.
The secondary histone doubletPrimary H4-H3 doublet splitting (Fig. 6A(a)(b)) could have occurred and been followed by the first recombination event, leading to a H3 N-terminus identical to those found in Medusavirus and Clandestinovirus (Fig. 6A(c)). A secondary recombination event would have created the Medusavirus stheno H3-H4 doublet (Fig. 6B(a)). Similarly, a secondary recombination event could have given rise to a H4-H3 doublet (Fig. 6B(b)), a H2B-H2A-H3 triplet (Fig. 6B(c)) and a H2B-H2A-H3-H4 quartet (Fig. 6B(d)), while maintaining the ‘budding yeast H3 K37-P38-H39-R40-Y41-K42-P43-G44-T45-V46-A47’-like sequences in the H3 of all oligomers (Fig. 6B). Thus, the evolutionary conservation demonstrated in Figure 6B(a)–(d) provides good evidence for the predicted structural and functional benefits of such a H3 sequence (Fig. 5C).
Since we have defined the H2B-H2A doublet as the primary state of H2A and H2B molecules (Figs. 1 and 2), we postulate that the H2A and H2B of LECA and Medusavirus were derived from splitting of the H2B-H2A doublet (Figs. 6C(a)(b)). It would therefore follow that the H2A-H2B doublet found in two giant viruses (Fig. 6C(c)) should be a secondary recombination product derived from the two free H2A and H2B molecules. During a second recombination event (Figs. 6B and 6C), the N-tails of H2A, H2B, H3 and H4 may always have been sacrificed, suggesting that each histone tail was not structurally or functionally essential for the ancient nucleosome (Fig. 6D). Marseillevirus nucleosomes, which have only N-tails of H2B and H4 (see Fig. 7A(a)), are tightly packed with no phasing over genes (Bryson et al., 2022), implying that the primary role of viral nucleosomes is tight packaging of chromatin rather than transcriptional regulation via histone modifications.
Moreover, in extant organisms, deletion of the N-tail in yeast is not lethal (Fig. 6E) (Kim et al., 2012). Deletion of each N-tail in the human nucleosome does not substantially affect nucleosome structure (Fig. 6E) (Iwasaki et al., 2013). Thus, we propose that all histone N-tails were less important in pre-LECA lineages and giant viruses than they are in extant eukaryotic cells.
Histone variants in LECAWe next wanted to address the question of the origin of histone variants. Besides histone N-tails, histone doublets, triplets and quartets may have prohibited the emergence of variants of each histone (Figs. 6B and 6C). Among a variety of giant viruses, only Medusavirus has four N-tails and the potential for emergence of each histone variant (Fig. 7A). Furthermore, although the majority of extant eukaryotes have both H2A.Z (H2A variant) and CENP-A (H3 variant), primitive eukaryotes such as G. lamblia and T. brucei have only one of the two (Fig. 7B(a)) (Akiyoshi and Gull, 2014; Talbert et al., 2019; Grau-Bové et al., 2022). Phylogenetic comparison of the nucleosome domain required for H2A.Z to bind chromatin (Fig. 7C) leads us to propose that the LECA had H2A.Z (Fig. 7B(b)). We have previously demonstrated that each of the point mutants H4-L97A, -Y98A, -G-99A and H2B-D71A lacks H2A.Z on its chromatin (Kawashima et al., 2011; Nakabayashi et al., 2014, 2020; Nakabayashi and Seki, 2022). Since H4-L97, -Y98, -G-99 (Fig. 7C(a)) and H2B-D71 (Fig. 7C(b)) are perfectly conserved in G. lamblia, we conclude that LECA had H2A.Z and G. lamblia lost it; this speculation is consistent with a previous report (Grau-Bové et al., 2022).
The emergence of H2A.Z (Fig. 7B(b)) and a nucleosome domain for H2A.Z (Fig. 7C) could specify the order in which histone doublets split. We propose that the H2B-H2A doublet (Fig. 7D(a)) split first, followed by the H4-H3 doublet (Fig. 7D(b)) (Fig. 6A). Phylogenetic analyses suggest that H4-L90 and -Y98 residues appeared in pre-LECA lineages, just like in some extant giant viruses (Fig. 7C(a) and Supplementary Fig. S1). In the presence of interactions between the H3 αN, the H4 α1 and the H2A'docking domain in pre-LECA, H3-L48, -I51, -R52, -Q55 and H4-R40 (Supplementary Fig. S2) would have gained function. Splitting of H2B-H2A would have enabled the emergence of H2A.Z, after which the H4-Y98 and H2B-D71 pair could acquire structure and functionality to maintain H2A.Z on chromatin. Thus, the interaction between the H3 αN and the H2A.Z' docking domain (PDB: 1F66) may be descended directly from the original interaction between the H3 αN and the H2A' docking domain (Fig. 7E).
Splitting of the H4-H3 doublet would have provided a free H4 C-terminus containing H4-Y98, allowing both H4-L97 and -G99 residues (adjacent to H4-Y98) to gain function in maintaining H2A.Z status (Fig. 7C(a)). Independently, a newly created H3 N-terminus would have been able to acquire common H3 N-tails that resemble the extant eukaryotic one (Figs. 5, 6 and 7E). We posit that the point at which this occurred represents the point of origin of the LECA, as the ‘alternative candidate of LECA’ (Fig. 7E), within Euglenozoa, to which T. brucei belongs. Of note, the ‘postulated LECA’ proposed in many studies is outside the Euglenozoa (Fig. 7E) (Al Jewari and Baldauf, 2023).
Importantly, if the ‘alternative candidate of LECA’ is correct, the LECA would resemble T. brucei, which does not have a CENP-A-dependent chromosome segregation system. Following the emergence of CENP-A, there would have been a transition from a CENP-A-independent to a CENP-A-dependent segregation system, as found in most extant eukaryotes (Fig. 7E). Trypanosoma brucei has 27 kinetoplastid kinetochore proteins including KKT10 and KKIP7, but not CENP-A or conventional kinetochore proteins (Ndc80 and Nuf2) (Butenko et al., 2020). By contrast, most eukaryotes including yeast and human have CENP-A, Ndc80 and Nuf2, but not KKT10 or KKIP7. Euglena gracilis, belonging to Euglenozoa, has CENP-A, Ndc80, Nuf2, KKT10 and KKIP7. Since E. gracilis could represent a transition intermediate state from a CENP-A-independent to a CENP-A-dependent segregation system, we propose that splitting of the H2B-H2A doublet led to the emergence of the H2A variant H2A.Z, and that splitting of the H4-H3 doublet ultimately led to establishment of the CENP-A-dependent segregation system after the establishment of the LECA (‘alternative candidate of LECA’).
Notably, regardless of whether the ‘alternative candidate of LECA’ or ‘postulated LECA’ is correct, some species, such as G. lamblia, lost H2A.Z (Fig. 7E), consistent with a previous report (Grau-Bové et al., 2022). Although T. brucei lost CENP-A from the ‘postulated LECA’, consistent with a previous report (Grau-Bové et al., 2022), it lost nothing from the ‘alternative candidate of LECA’ (Fig. 7E).
Differences in sequences of N-tails between T. brucei and S. cerevisiaeAnother key unanswered question is the reason for the variation in H3 N-tail sequences. Like S. cerevisiae H2A.Z, T. brucei H2A.Z together with a Kinetoplastea-specific histone H2B variant, H2B.V, are enriched at transcription start sites (TSSs) (Fig. 8A) (Kraus et al., 2020). As shown in Figure 8B, the amino acid sequences of canonical histone N-tails in T. brucei are quite different from those of S. cerevisiae. Specifically, T. brucei H2A.Z is more enriched in GK/GGK sequences than H2A.Z of S. cerevisiae (Fig. 8B(a) and Supplementary Fig. S4A) (see Supplementary Fig. S9). Moreover, the Kinetoplastea-specific H2B.V is also rich in GK/GGK sequences (Fig. 8B(a) and Supplementary Fig. S4B). Given that the GK/GGK sequence found in the S. cerevisiae H4 N-tail is regulated by acetylation (Grunstein and Gasser, 2013), we hypothesize that nucleosomes containing T. brucei H2A.Z and H2B.V at a TSS are hyper-acetylated. Indeed, acetylation of nucleosomes at TSSs in T. brucei has been reported (Kraus et al., 2020), providing a good explanation for prokaryotic-type transcription regulation in T. brucei (Faria, 2021).
In terms of GK/GGK number, nucleosomes at TSSs have 30 GK/GGKs in T. brucei, although the canonical nucleosome has only two GK/GGKs (Fig. 8C(a)). By contrast, S. cerevisiae, as well as Acanthamoeba castellanii and Arabidopsis thaliana, commonly have 12–14 GK/GGKs in the canonical nucleosome, while their TSS nucleosomes have 14, 18 and 14 GK/GGKs, respectively (Fig. 8C(b)–(d)), reflecting that the number of GK/GGK sequences is always larger in H2A.Z than in canonical H2A (see Supplementary Fig. S9). Of note, GK/GGKs may be acetylation sites that cannot be methylated or ubiquitylated.
Next, we addressed the H3 N-tails found in a variety of eukaryotes, paying attention to GK/GGK sequences as an apparent marker of sequence variation (Fig. 9).
Possible differentiation of extant H3 N-tails from the predicted H3 N-tail in the LECA
Since the H3 N-tails of Medusavirus, Medusavirus stheno and Clandestinovirus resemble those of S. cerevisiae, we can predict which H3 N-tail, which does not contain GGK, would be found in the LECA (‘alternative candidate of LECA’) (Fig. 4 and Fig. 9A, upper line). By contrast, using H3 N-tail sequences derived from extant eukaryotes, the H3 N-tail, which contains GGK, is reconstructed (Postberg et al., 2010) as the H3 N-tail of the LECA (‘postulated LECA’) (Fig. 4 and Fig. 9A, lower line). Notably, since lysine residues such as H3-K4, -K9, -K27 and -K36 are predicted to be conserved in both the ‘alternative candidate of LECA’ and the ‘postulated LECA’ (Fig. 9A), the LECA already had lysine methyltransferases acting on these residues, as proposed previously (Grau-Bové et al., 2022). Besides methyltransferases, a variety of histone H3 tail interactors in extant eukaryotes could have interacted with predicted ancient H3 N-tails due to high conservation between the human H3 N-tail and those in Figure 9A (Supplementary Fig. S5).
Upon alignment of H3 N-tails derived from a variety of eukaryotes, groups a–d in Figure 9B are mapped onto the phylogenetic tree of eukaryotes (Al Jewari and Baldauf, 2023) (Fig. 9C). Starting from the ‘postulated LECA’, Discoba (T. brucei and E. gracilis) could be the fourth branch of the eukaryotic tree after Parabasalia (Trichomonas vaginalis), Fornicata (G. lamblia) and Preaxostyla (Fig. 9C) (Al Jewari and Baldauf, 2023).
Remarkably, although E. gracilis (Fig. 9B group c) belongs to Euglenozoa and T. vaginalis (Fig. 9B group c) belongs to Parabasalia, their N-tails containing GGK are almost identical to that of S. cerevisiae (Figs. 9A and 9B group c) (Postberg et al., 2010). Conversely, T. brucei (Fig. 9B group a), which belongs to Euglenozoa, and G. lamblia (Fig. 9B group d), which belongs to Fornicata, have very different H3 N-tails lacking GGK compared with S. cerevisiae (Fig. 9B group c).
Starting from the ‘postulated LECA’, which has H2A.Z, CENP-A and a H3 N-tail containing GGK, G. lamblia lost both H2A.Z and the GGK sequence in the H3 N-tail. Trypanosoma brucei lost both CENP-A and the GGK sequence in the H3 N-tail. By contrast, starting from the ‘alternative candidate of LECA’, although G. lamblia lost H2A.Z, T. brucei lost nothing. A lack of CENP-A but not of H2A.Z is lethal in S. cerevisiae, implying that H2A.Z could be lost much more easily than CENP-A in primitive eukaryotes. Thus, we favor the ‘alternative candidate of LECA’ model; however, this conclusion is supported by very few reports (Cavalier-Smith, 2010; Akiyoshi and Gull, 2014). Notably, the LCEA (last common euglenozoan ancestor) has been reconstructed (Vesteg et al., 2019). If the ‘alternative candidate of LECA’ is the true LECA, the LECA closely resembles the LCEA (Fig. 9C).
Finally, besides the ‘alternative candidate of LECA’ and ‘postulated LECA’ in Figure 9B, many other roots of eukaryotes, as the LECA, have been predicted (Hampl, et al., 2009; Derelle et al., 2015; Gabaldón, 2021). In future work, any LECA models should appropriately explain the evolutionary history of eukaryotic canonical and variant nucleosomes using the information presented in this study.
This study used 26 conserved histone residues as an evolutionary searchlight to elucidate the pathway via which the archaeal nucleosome evolved into the extant eukaryotic nucleosome (Fig. 2D and Supplementary Figs. S1, S6A and S6B).
Among the 26 histone residues, eight residues were already present in the archaeal nucleosome: H3-E97, -H113, -R116, -T118, -D123; H4-R36, -K39 (corresponding to yeast R39) and -R45 (Supplementary Figs. S1 and S6B(a)). We propose that following the establishment of a nucleosome containing two H2B-H2A doublets and two H4-H3 doublets (Figs. 7E and 9C and Supplementary Fig. S6B(b)), 16 histone residues (H2A-Y58, -E62, -L66, -D91, -E93, -L94; H2B-L109; H3-L48, -I51, -F54, -Q55, -L62, -L130; H4-R40, -L90 and -Y98) in pre-LECA lineage cells co-evolved with those of giant viruses (Supplementary Fig. S6B(c)). Thus, nucleosomes of Marseillevirus and Melbournevirus (Supplementary Fig. S6B(d)), in which no interaction occurs between H3 αN and the H2A' docking domain (Fig. 3A(b)), may be directly descended from the first nucleosome (Supplementary Fig. S6B(b)).
During parallel evolution of nucleosomes in a variety of pre-LECA lineages, some nucleosomes occasionally acquired interaction between the H3 αN and the H2A'docking domain (Fig. 7E ①), allowing separation of the H4-H3 doublet (Figs. 3, 5 and 7). In our model, the splitting of the H2B-H2A doublet rather than the H4-H3 doublet first occurred in a eukaryotic lineage nucleosome (Fig. 7E and Supplementary Fig. S6B(e)). Soon after, H2A.Z emerged and the interaction of H2B-D71 with H4-Y98 was fixed on the H2B sequence to allow stabilization of H2A.Z on chromatin.
Next, the splitting of the H4-H3 doublet occurred. After that, the newly formed H3 N-terminus acquired the common H3 N-tail (Fig. 7E and Supplementary Fig. S6B(f)), establishing the LECA (‘alternative candidate of LECA’). At that time, one residue that is lethal when mutated, H4-Y72, and is involved in the interaction between H4 and H2B at their four-helix bundle, became accidentally fixed on the eukaryotic lineage H4 sequence (Supplementary Fig. S6B(f)), establishing the accumulation of all 26 conserved residues listed in Fig. 2 and Supplementary Figs. S1 and S6A. Further, duplication of H3 gave rise to the H3 variant CENP-A (Figs. 7E and 9C and Supplementary Fig. S6B(g)), and, consequently, the CENP-A-dependent chromosome segregation system could be established (Figs. 7E and 9C).
Finally, since in our study we used only 26 histone residues (Fig. 2D and Supplementary Fig. S1) out of a total of 170 functional residues (Supplementary Figs. S3B, S3C, and S6A) to successfully gain insights into the evolution of eukaryotic histones, future structural and phylogenetic approaches that examine all 170 histone residues will allow for a more precise analysis of the evolutionary path of the eukaryotic nucleosome at single amino acid resolution.
The authors declare that they have no known competing financial interests or personal relationships that could appear to influence the work reported in this paper.
We thank Dr. M. Horikoshi (Jobu University) for the initial analyses of the histone point mutant library and the H2B-H2A doublet. We thank Ms. M. Seki for illustrating some of the figures. This study was financially supported by the Tohoku Medical and Pharmaceutical University, whose founding spirit is “We will open the gate of truth”; thus, we wish to extend our gratitude to Dr. M. Takayanagi, president of the University.