Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full Paper
A hypothesis for nucleosome evolution based on mutational analysis
Yu NakabayashiMasayuki Seki
著者情報
キーワード: histone, nucleosome, LECA, giant virus, H2A.Z
ジャーナル オープンアクセス HTML
電子付録

2025 年 100 巻 論文ID: 24-00143

詳細
ABSTRACT

Nucleosomes are complexes of DNA and histone proteins that form the basis of eukaryotic chromatin. Eukaryotic histones are descended from archaeal homologs; however, how this occurred remains unclear. Our previous genetic analysis of the budding yeast nucleosome identified 26 histone residues conserved between Saccharomyces cerevisiae and Trypanosoma brucei: 15 that are lethal when mutated and 11 that are synthetically lethal with deletion of the FEN1 nuclease. These residues are partially conserved in nucleosomes of a variety of giant viruses, allowing us to follow the route by which they were established in the LECA (last eukaryotic common ancestor). We analyzed yeast nucleosome genetic data to generate a model for the emergence of the eukaryotic nucleosome. In our model, histone H2B-H2A and H4-H3 doublets found in giant virus nucleosomes facilitated the formation of the acidic patch surface and nucleosome entry sites of the eukaryotic nucleosome, respectively. Splitting of the H2B-H2A doublet resulted in the H2A variant H2A.Z, and subsequent splitting of the H4-H3 doublet led to a eukaryote-specific domain required for chromatin binding of H2A.Z. We propose that the LECA emerged when the newly split H3 N-terminus horizontally acquired a common N-tail found in extinct pre-LECA lineages and some extant giant viruses. This hypothesis predicts that the emergence of the H3 variant CENP-A and the establishment of CENP-A-dependent chromosome segregation occurred after the emergence of the LECA, implying that the root of all eukaryotes is assigned within Euglenida

INTRODUCTION

Histones are the most highly conserved eukaryotic proteins, suggesting that the histones of the last eukaryotic common ancestor (LECA) should be almost identical to those found in extant eukaryotes. In extant eukaryotes, the nucleosome is composed of a histone octamer (comprising two H2A/H2B dimers and a [H3/H4]2 tetramer) that wraps 146 bp of DNA (Luger et al., 1997). Although evidence suggests that eukaryotic histones are descended from archaeal histones (Mattiroli et al., 2017), how the simpler archaeal nucleosome transformed into the eukaryotic nucleosome remains unclear. Some giant viruses have a variety of histones with variable configurations including singlet, doublet, triplet or quadruplet (H2A, H2B, H3, H4, H2B-H2A, H2A-H2B, H4-H3, H3-H4, H2B-H2A-H3 or H2B-H2A-H3-H4) (Talbert et al., 2022; Irwin and Richards, 2024). In particular, giant virus nucleosomes that contain two H2B-H2A doublets and two H4-H3 doublets, and are thus remarkably similar to the eukaryotic nucleosome, have been observed (Liu et al., 2021; Valencia-Sánchez et al., 2021), suggesting that the eukaryotic nucleosome evolved alongside those of giant viruses. However, the evolutionary path of the eukaryotic nucleosome from (or to) the giant virus nucleosome remains unclear.

While providing important insights into the likely origin of the eukaryotic nucleosome, previous evolutionary work on histone biology and the LECA has raised several further questions. First, the phylogenetic root of all eukaryotic lineages (LECA) is still debated (Gabaldón, 2021). Furthermore, Trypanosoma brucei does not have a CENP-A-dependent chromosome segregation system (Akiyoshi and Gull, 2014; Tromer et al., 2021). Thus, either the LECA had a CENP-A-dependent chromosome segregation system and T. brucei lost it or the LECA had no CENP-A system and the LECA gained it. It is not yet clear which of these two possibilities is correct. Histones possess an unstructured N-terminus that protrudes from the nucleosome and is often subject to posttranslational modification that regulates the degree of compaction of chromatin; however, in T. brucei and Giardia lamblia the amino acid sequences of the H3 N-tails are poorly conserved compared with those in other eukaryotes, including Saccharomyces cerevisiae (Postberg et al., 2010). The reason for this difference remains elusive.

Another interesting question is the origin of the H2A variant H2A.Z, as the G. lamblia genome does not harbor this variant (Talbert et al., 2019), suggesting either that the LECA had H2A.Z and G. lamblia lost it or that the LECA had no H2A.Z and subsequently gained it. Again, it is not yet clear which of these two possibilities is correct. Finally, it is unclear whether the H2B-H2A and H4-H3 doublets found in giant viruses are the ancestor or the descendant of the extant H2A, H2B and H3, H4 singlets (Talbert et al., 2022). Although phylogenetic metagenome analyses of histone genes imply that the H2B-H2A and H4-H3 doublets are ancestors of the eukaryotic nucleosome (Irwin and Richards, 2024), it remains elusive how H2B-H2A and H4-H3 are mechanistically transformed into separate histone singlets.

Using yeast genetics, we have previously identified 15 histone residues that are lethal when mutated (Sakamoto et al., 2009) and 11 that are synthetically lethal in combination with the loss of the Okazaki fragment-processing enzyme FEN1 (Nakabayashi and Seki, 2023). We define here a ‘functional residue’ as a residue whose mutation causes a change in phenotype. All of the above 26 functional histone residues are perfectly conserved between S. cerevisiae and T. brucei (Nakabayashi and Seki, 2023) and thus can be used as an evolutionary searchlight to address the above six questions. We performed a systematic analysis of the existing yeast genetics data and present a detailed hypothesis for the evolution of the nucleosome from archaea to extant eukaryotes.

RESULTS

The H2B-H2A histone doublet is more stable than the H2A/H2B heterodimer

Analysis of the crystal structure of three histone B homodimers from Methanothermus fervidus (HMfB) bound to 90 bp of DNA has provided valuable insights into the structure of the archaeal nucleosome (Fig. 1A(a)) (Mattiroli et al., 2017). This analysis, combined with the solved solution structure of a human H2A/H2B heterodimer (Fig. 1A(b)) (Moriwaki et al., 2016), demonstrated that a histone fold (HF) is highly conserved and commonly found in both the archaeal nucleosome and the human H2A/H2B heterodimer (Fig. 1A(c)). However, major structural differences between Figure 1A(a) and (b) can be found in the regions flanking the HF. The H2A αN, H2A βC and docking domain, all found in the human nucleosome, are disordered (Fig. 1A(c)). Moreover, the L3 region of H2B is flexible, leading to H2B αC rotation (Fig. 1A(c)).

Fig. 1. The postulated ancestral state of histone H2A and H2B. Colors of DNA and histone chains in each nucleosome structure presented in Figure 1A, 1B, and 1C) are described in the corresponding PDB accession. (A) Structures of histones. (a) Structure of an archaeal nucleosome comprising three HMfB homodimers and 90 bp of DNA (PDB; 5T5K). The black rectangle represents a single HMfB monomer containing a histone fold (HF). (b) Structure of the human H2A/H2B dimer in solution. The yellow and red rectangles represent H2A and H2B, respectively. (c) Schematic representation of the secondary structures found in 5T5K (Fig. 1A(a)) and 1AOI (see Fig. 1C(b)). A HF consisting of α1, α2 and α3 is commonly found in HMfB, H2A and H2B. α and β represent α-helix and β-strand. In addition to the HF, H2A has αN, αC, βC and docking domains. H2B has αC. L1, L2 and L3 represent the loop found between corresponding α-helixes. The H2A αN, βC and docking domains (Fig. 1A(b)) are disordered in solution. Moreover, the L3 region of H2B is flexible, leading to H2B αC rotation in solution. (B) Histone doublets. (a) The schematic represents a theoretical histone H2B-H2A doublet in solution where the domain organization would result in no rotation in H2B αC and the H2A αN would not be disordered. (b) Schematic representation of a theoretical histone H2A-H2B doublet in solution, where the domain organization would mean that the H2A βC and docking domain would not be disordered. However, H2A αN will be disordered and H2B αC will be rotated in solution. (c), (d), (e) Analysis of structures containing artificially fused H2B-H2A or H2B-H2A.Z (H2A variant) doublet. PDB numbers are indicated on the right of each structure. (f) FALC (functional analysis of linker-mediated complex) strategy. Histone doublets, such as H2B-H2A (No. 1, 5), H2B-H2A.Z (No. 2, 3, 4) and H2B-macro H2A1 (No. 6), are functional in cells. (C) The H2B-H2A doublet first model. (a) H2B-H2A (Fig. 1B(a)–(f)) is more stable than the H2A/H2B heterodimer. (b) Structure of the Xenopus laevis nucleosome containing two H2A/H2B heterodimers. (c), (d) Structures of nucleosomes found in giant viruses, which exhibit remarkable similarity to the X. laevis nucleosome. Each giant virus nucleosome contains two H2B-H2A doublets. (e), (f) Schematic representation of the possible lineage models of H2B-H2A and H2A/H2B. H2B-H2A may be an ancestor of the H2A/H2B heterodimer (e) or the H2A/H2B heterodimer may be an ancestor of the H2B-H2A doublet (f). The thick arrow represents the model we favor (e).

Since H2B-H2A and H2A-H2B histone doublets are found in some giant viruses (Talbert et al., 2022), we theoretically fused human H2A and H2B as shown in Figure 1B(a) and (b). There is good evidence that these histone doublets would be more structurally stable than the H2A/H2B heterodimer. Indeed, exploitation of the structural stability of the artificially fused H2B-H2A doublet and H2B-H2A.Z (H2A variant) doublet has been used several times to facilitate structural analyses of these proteins (Fig. 1B(c)–(e)). Moreover, both H2B-H2A and H2B-H2A variant doublets are functional in budding yeast, chicken and human cells (Fig. 1C(f)) (Nakabayashi et al., 2014, 2020; Ruiz and Gamble, 2018; Kitagawa et al., 2021), while a H2B-H2A doublet rescues the lethality of a double gene deletion mutant of H2A and H2B (Fig. 1C(f)1) (Nakabayashi et al., 2014). Thus, we suggest that the H2B-H2A doublet is more stable than the H2A/H2B heterodimer (Fig. 1C(a)).

Structurally, eukaryotic nucleosomes (Fig. 1C(b)) (Luger et al., 1997) are very similar to giant virus nucleosomes (Fig. 1C(c)(d)) (Liu et al., 2021; Valencia-Sánchez et al., 2021). Although it has not yet been determined whether H2B-H2A is the ancestor (Fig. 1C(e)) or the descendant (Fig. 1C(f)), these analyses suggest that the former is more likely. Moreover, a recent systematic phylogenetic analysis of 258 histones of 168 giant viral metagenomes revealed that viral histone doublets originated in stem eukaryotes and that nucleosome evolution proceeded through histone doublet intermediates (Irwin and Richards, 2024). Thus, we tentatively subjected the ‘H2B-H2A ancestor hypothesis’ to further consideration (Fig. 1C(e)).

Evolution of the eukaryotic acidic patch surface through the H2B-H2A doublet

To explore the H2B-H2A ancestor hypothesis, we followed the evolutionary path from archaeal to eukaryotic histones. Analyses demonstrate that some archaea have multiple histone genes (Nishida and Oshima, 2017) and that archaeal histones sometimes have both N- and C-tails on the outside of the HF (Mattiroli et al., 2017). Histone doublets are also found in some archaea (Talbert et al., 2019). Taken together, the evidence suggests that prototypes of the extant histones H2A, H2B, H3 and H4 existed in some archaea, followed later by proto-H2A and proto-H2B, both of which have long N- and C-tails, fusing to give a H2B-H2A doublet (Fig. 2A ①). The long linker region between H2B and H2A would allow the formation of two α-helixes, H2B αC and H2A αN. Notably, the rotational ability of the extant eukaryotic H2B αC (Fig. 1A(c)) is fixed by a mutual interaction between H2B αC and H2A αN in the nucleosome. The surface created by the interaction between H2A α2 and H2B αC could have formed a primitive landing pad for primitive nucleosome-binding proteins (Fig. 2B ②③).

Fig. 2. The origin of the acidic patch on extant nucleosomes. (A) Schematic representation of the means by which H2B-H2A may have gained function. Based on the ‘H2B-H2A first’ model, we suggest that H2B fused to H2A (①). (B) Three α-helixes. The linker region between H2B and H2A (Fig. 2A), αC of H2B (③), could interact with α2 of H2A (②) upon histone folding, leading to an initial primitive surface that provides a landing pad for nucleosome-binding factors. This surface might have been beneficial to pre-LECA lineages, triggering the formation of αC of H2A (④), which was initially a disordered region. In the ancient nucleosome the αC of H2A interacted with both α2 of H2A and αC of H2B, forming a primitive acidic patch. Like the extant αC of H2B, the ancient αC of H2A is expected to be fixed only on the ancient nucleosome, but not in a H2B-H2A doublet in solution. Since the primitive acidic patch would confer a significant advantage to pre-LECA lineages, mutual interactions between α3 and αC in H2A may have been positively accumulated (⑤). Residues exhibiting lethality when mutated or synthetic lethality when mutated in the absence of the FEN1 nuclease are mapped on the nucleosome surfaces (PDB ID: 1ID3). Yellow and red chains represent histone H2A and H2B, respectively. Residues are indicated on the left-hand side. Labels with no shading are present within the HF and those with gray backgrounds are outside the HF. (C) The fixed acidic patch of H2B-H2A in solution. During the proposed series of evolution (①–⑤), it is likely that the primitive acidic patch would be fixed on H2B-H2A even in solution, thus making it available for their interactors, regardless of the state of the nucleosome. (D) Comparison of histone residues on the acidic patch. Residues exhibiting lethality when mutated or synthetic lethality when mutated in the absence of the FEN1 nuclease are marked by blue or red, respectively. Although the eight residues shown here are highly conserved between S. cerevisiae and T. brucei, the corresponding residues in a variety of giant viruses are partially conserved. With the exception of H2B-D71 (gray column), all residues shown in Figure 2B can be identified in giant viruses. H2A-Y58 and -E62 map to H2A α2. H2A-D91 and -E93 map to L3 and αC of H2A, respectively. H2B-L109 is found on αC of H2B. There is more than 70% conservation of all residues (H2A-Y58, -E62, -D91, -E93 and H2B-L109) among giant viruses, suggesting that gaining function of histone residues (⑥) has co-evolved with the formation of the primitive acidic patch surface (①–⑤). The accession numbers of H2A and H2B sequences used in the figure are as follows: S. cerevisiae H2A (UniProtKB P04911), S. cerevisiae H2B (UniProtKB P02294), Homo sapiens H2A (UniProtKB P04908), H. sapiens H2B (UniProtKB P06899), T. brucei H2A (UniProtKB Q57YA3), T. brucei H2B (UniProtKB Q389T1), Pandravirus H2B (OFAI01000004), Marseillevirus H2B-H2A (ADB04176), Melbournevirus H2B-H2A (YP_009094870.1), Medusavirus H2A (BBI30458.1), Medusavirus H2B (BBI30201.1), Medusavirus stheno H2A (QPB44482.1), Medusavirus stheno H2B (QPB44246.1), Clandestinovirus H2B-H2A (QYA187), Marine iridovirus H2B-H2A (IR01_SRX802077.164_contig_168297), Loki’s Castle H2B-H2A (LCPAC001_QBK89657.1), Loki’s Castle H2B-H2A-H3 (LCMAC102_QBK86552.1), Loki’s Castle H2B-H2A-H3-H4 (LCMAC101_QBK85747.1), Indivirus H2A-H2B (ARF09917), and Klosneuvirus H2A-H2B (Talbert et al., 2022).

It is possible that the H2B-H2A doublet had an ancient long, disordered C-terminus with a downstream region adjacent to the H2A HF that would be able to interact with both H2A α2 and H2B αC, leading to H2A αC and a primitive acidic patch on the ancient nucleosome (Fig. 2A ④). Of the 26 conserved histone residues we have previously identified (Nakabayashi and Seki, 2023), seven are found on the acidic patch (Fig. 2B). Since the acidic patch of extant nucleosomes is an excellent landing pad for half of all nucleosome-interacting factors (Skrajna et al., 2020), it is likely that a strong Darwinian driving force would have facilitated mutual interaction between H2A α3 and H2A αC (Fig. 2A ⑤). Thus, the H2B-H2A doublet, but not a H2A/H2B heterodimer or a H2A-H2B doublet, would form a stable acidic patch (Fig. 2C), strongly supporting the H2B-H2A ancestor hypothesis. In this scenario, once the progenitor of the acidic patch was established, co-evolution between the acidic patch and its interactors would have occurred independently in a variety of pre-LECA lineages, reflected in the great variation of residues found in different giant viruses (Fig. 2D ⑥).

The H4-H3 doublet may be the ancestor of extant histones

We next asked whether the H4-H3 doublet is also likely to be an ancestor of the extant heterodimer by further examining the theoretical stability of these complexes. Either absence of a H2A/H2B heterodimer (PDB: 7X57, 2IO5, 5BS7) or lack of interaction between the H3 αN and the H2A' docking domain (PDB: 6M4G) leads to disordering of the H3 αN (Fig. 3A(a)), which usually forms an α-helix in the nucleosome. Moreover, the H4 βC, which interacts with the H2A' βC in the nucleosome and forms a short β-sheet, is disordered in the absence of the H2A/H2B dimer (PDB: 7X57, 5BS7) (Fig. 3A(a)). Notably, in the two available structures of the giant virus nucleosome, there is no interaction between the H2A' docking domain and the H3 αN (PDB: 7LV8, 7N8N). Thus, we predict that in the giant virus nucleosome, the H3 αN is disordered when H4-H3 is artificially split into H4 and H3 monomers (Fig. 3A(b)).

Theoretical fusion of H3 and H4 monomers led to the formation of H4-H3 and H3-H4 doublets (Fig. 3A(c)(d)). Given that H3 αN and H4 βC are only structurally stable in the H4-H3 doublet (Fig. 3B), we predict that H4-H3 is the ancestor of the extant eukaryotic H3/H4 heterodimer. Besides histone residues present in HRD-I (homologous recombination domain I) (Nakabayashi and Seki, 2023) (Fig. 2B), HRD-II (Fig. 3C(a)), HRD-III (Fig. 3C(b)) and HRD-IV (Fig. 3C(c)), related residues are shown. Among those residues, four conserved residues (H3-L48, -I51, -F54 and -Q55) on H3 αN (Fig. 3C(a)) and one (H4-Y98) on H4 βC (Fig. 3C(c)) could be established only in the H4-H3 doublet (Fig. 3A(c)). Thus, it seems likely that the H4-H3 doublet is the ancestor (Fig. 3A(e)) rather than the descendant (Fig. 3A(f)).

Fig. 3. Formation of the H3 αN and H4 βC domains. (A) States of H3/H4 heterodimer, H3-H4 doublet and H4-H3 doublet. Colors of DNA and histone chains in each nucleosome structure are described in the corresponding PDB accession. (a) Schematic secondary structures found in 5T5K (Fig. 1A(a)) and 1AOI (Fig. 1C(b)). A histone fold (HF) composed of α1, α2 and α3 is commonly found in HMfB (black), H3 (blue) and H4 (green). As well as the HF, H3 has an αN and H4 has a βC. Notably, the H3 αN is disordered in some structures (7X57, 6M4G, 2IO5 and 5BS7). The H4 βC is also disordered in some structures (7X57 and 5BS7). (b) Lack of interaction between the H3 αN and the H2A' docking domain leads to disordering of the H3 αN in the extant histone complex (Fig. 3A(a)). Since nucleosomes found in both giant viruses (7LV8 and 7N8N), which comprise two H2B-H2A doublets, two H4-H3 doublets and 121 bp of DNA, have no interaction between the H3 αN and the H2A' docking domain, artificial splitting of the H4-H3 doublet will lead to disordering of the H3 αN in these nucleosomes. The linker region between H4 and H3 in the H4-H3 doublet interacts with nucleosomal DNA. Thus, it is possible that the split H3 N-end tethers DNA and stabilizes the H3 αN in nucleosomes with H3 and H4. However, even if the adjacent N-terminal region of the extant H3 αN has the ability to interact with DNA (see Fig. 4), the H3 αN of human H3 is disordered (Fig. 3A(a)). Thus, it likely that if the H4-H3 doublet is split in those giant virus nucleosomes, H3 αN will be disordered and lose function. (c) A theoretical histone H4-H3 doublet in solution, leading to ordered H4 βC and H3 αN. (d) A theoretical histone H3-H4 doublet in solution, leading to disordered H4 βC and H3 αN. (e)(f) Schematic representation of the possible lineage models of H4-H3 and H3/H4. The H4-H3 doublet may be an ancestor of the H3/H4 heterodimer (e) or the H3/H4 heterodimer may be an ancestor of the H4-H3 doublet (f). The thick arrow represents the model we favor (e). (B) Formation of the H3 αN and the H4 βC in a H4-H3 doublet. Experimental and theoretical considerations suggest that a H4-H3 doublet is beneficial for the functional domains of the H3 αN and the H4 βC, even in solution. (C) The H3 αN and H4 βC provide function. Residues that are lethal when mutated or synthetically lethal when mutated in the absence of FEN1 nuclease are mapped onto the nucleosome surfaces (PDB ID: 1ID3). Blue and green chains represent histone H3 and H4, respectively. Residues are indicated on the left-hand side. Labels with no shading are present within the HF and those with gray backgrounds are outside the HF. Among the residues presented in (a)–(c), H3-L48, -I51, -F54 and -Q55 are found on H3 αN, and H4-Y98 is on H4 βC. Eighteen histone residues in S. cerevisiae are partially conserved in histones H3 and H4 in a variety of giant viruses (Supplementary Fig. S1).

Evidence for horizontal transfer of the H3 N-terminal region

Eighteen of the conserved residues (including the five residues discussed above) previously identified on H3 or H4 (Nakabayashi and Seki, 2023) are perfectly conserved in Phycodnavirus H3 or Bracovirus H4 (Supplementary Fig. S1). Thus, these viral histones may have been derived from the extant eukaryote. By contrast, these 18 residues show mosaic conservation in a variety of giant virus histones (Supplementary Fig. S1), indicating that such giant viruses were derived from pre-LECA lineages (Irwin and Richards, 2024). Since three of the histone residues that are lethal when mutated (H3-L48, -I51, -Q55) are localized on the disordered H3 αN (Fig. 3A(a)), we next compared the N-terminal region (including H3 αN) just upstream of the H3 HF in a variety of giant virus H3 histones (Fig. 4).

Fig. 4. Common sequence of the region adjacent to the N region of H3 α N. Comparison of H3 N-terminal regions containing H3 αN. Since budding yeast H3 αN contains lethal residues when mutated (H3-L48, -I51 and -Q55) (pink residues), disordering of H3 αN (Fig. 3A) could result in loss of canonical nucleosome function. H3-L48, -I51 and -Q55 are required for interaction between the H3 αN and the H2A' docking domain (Supplementary Fig. S2). These H3 αN residues that are lethal when mutated are partly or perfectly conserved among histone H3s found in a variety of giant viruses. Although Medusavirus has only one residue that is lethal when mutated in yeast, H3-Q55, the H3 sequence in the N region adjacent to the H3 αN is almost identical in budding yeast and Medusavirus. The H3 sequence adjacent to the N region of the H3 αN is highly conserved in eight giant viruses including Medusavirus. This common H3 sequence interacts with nucleosomal DNA (black and gray dots) and has function (orange dots) in S. cerevisiae. Black and gray thick dashed lines represent disordered regions identified by structural analyses. A nucleosome containing a H3 mutant in which the H3 1–27 region is deleted essentially forms the same structure as the wild-type nucleosome (3W98). Notably, a H3 mutant with H3 1–30 deleted is viable in S. cerevisiae. Reconstruction of the amino acid sequence of the H3 N-tail of the pre-LECA. Only three free H3 N-tail sequences are available in giant viruses. We first constructed a consensus sequence for the N-tails of the three viruses, applying the following rules. Residues found in more than one virus are considered fixed. Residues found in only one virus but conserved with H3 in S. cerevisiae are considered fixed. Although H4-H3, H2B-H2A-H3 or H2B-H2A-H3-H4 in seven viruses has no free H3 N-end, the H3 αN region of these viruses is similar to that of S. cerevisiae. Thus, reconstruction of the H3 αN region was performed using a total of ten giant viruses as well as S. cerevisiae. H3 F54 of S. cerevisiae seems to correspond to Y of Wiseana iridescent virus (H4-H3), Marine iridovirus (H4-H3) and Loki’s Castle (H2B-H2A-H3-H4). Thus, we fixed Y on the predicted H3 αN in the pre-LECA. Similarly, H3 E50 and R52 are also fixed on the predicted H3 αN in the pre-LECA. Colors and symbols in the predicted H3 N-tail in the pre-LECA are as follows: blue, residue interacting with DNA (black and gray closed circles; PDB: 1ID3, AV2); pink, lethal residue when mutated (Fig. 3C and Supplementary Fig. S2); reddish brown, residue interacting with the H2A' docking domain (Supplementary Fig. S2); purple, HRD-II residue (Fig. 3C(a)); red, modifiable lysine residue; green, GK/GGK sequence; pale blue, predicted residue in the pre-LECA; orange closed circle, functional residue in S. cerevisiae (Supplementary Fig. S3B). The sequence of the H3 N-tail, which has GGK, was predicted by comparison of extant eukaryotes (Postberg et al., 2010). The accession numbers for H3 N-tail regions used are as follows: S. cerevisiae H3 (UniProtKB P61830), Phycodnavirus H3 (ERX552270.64), Medusavirus H3 (BBI30395.1), Medusavirus stheno H3-H4 (QPB444), Clandestinovirus H3 (QYA18687.1), Marseillevirus H4-H3 (YP_003407137.1), Melbournevirus H4-H3 (YP_009094869.1), Wiseana iridescent virus H4-H3 (YP_004732864.1), Marine iridovirus H4-H3 (SRX802077.164_contig_168297), Loki’s Castle H4-H3 (LCMAC101_QBK85672.1), Marine iridovirus H2B-H2A-H3 (SRX802077.164_contig_92501), Marine iridovirus H2B-H2A-H3 (LCMAC102_QBK86552.1), Loki’s Castle H2B-H2A-H3-H4 (LCMAC102_QBK86460.1) and Loki’s Castle H2B-H2A-H3-H4 (LCMAC101_QBK85747.1).

Of the identified histone residues that are lethal when mutated in S. cerevisiae, three (L/I/V48, I/V51 and Q55) are also present in seven giant viruses (Fig. 4). Since, in yeast, these three residues, together with -R52, are involved in the interaction between the H3 αN and the H2A' docking domain (Supplementary Fig. S2), this interaction may have co-evolved among pre-LECA lineages and giant viruses. In nucleosomes of Marseillevirus and Melbournevirus, which do not have such an interaction (Fig. 3A(b)), only one residue, H3-V51 (corresponding to H3-I51 of S. cerevisiae), is conserved. Splitting of the H4-H3 doublet in the absence of interaction between the H3 αN and the H2A' docking domain is likely to lead to disordering of the H3 αN (Fig. 3A(b)). Since both Medusavirus and its derivative Medusavirus stheno have free H3 N-termini, we predict that in the nucleosomes of these viruses, interaction would occur between the H3 αN and the H2A' docking domain. Given that only one (H3-Q55) out of the three lethal residues is found in Medusavirus and Medusavirus stheno, it seems likely that the lineages of these viruses are very distant from pre-LECA lineages.

Surprisingly, the H39–A47 residues of histone H3, which are just upstream of the lethal H3-L48 residue when mutated, in S. cerevisiae are also conserved in Medusavirus and Medusavirus stheno, as well as in five other giant viruses (Fig. 4). The predicted H3 N-tail from virus sequences and S. cerevisiae is similar to the previously reconstructed H3 N-tail using a variety of extant eukaryotes (Postberg et al., 2010). Notably, the H3 N-tail of S. cerevisiae, but not those of giant viruses, contains a GGK sequence (Fig. 4).

Comparison of nucleosome structures between the budding yeast canonical nucleosome and the human CENP-A-containing nucleosome demonstrates a clear structural boundary between H3-A47 and -L48 (Fig. 5A). An interaction between the H3 αN and the H2A' docking domain is a prerequisite for maintaining the structural and functional integrity of the H3 αN during splitting of the H4-H3 doublet (Fig. 5B(a)). Following splitting of the H4-H3 doublet, the newly formed H3 αN would be structurally identical to the extant CENP-A αN (Fig. 5B(b)). Since some residues in the linker region between H4 and H3 in the doublet of Marseillevirus and Melbournevirus interact with nucleosomal DNA (Fig. 4) (PDB: 7LV8, 7N8N), it is possible that the newly formed H3 αN would be more stable (Fig. 5B(c)).

Fig. 5. Common sequence of the region adjacent to the N region of H3 αN. (A) A functional and structural border around the H3 αN. (a) Comparison between the canonical nucleosome and the CENP-A nucleosome. Although the H3 sequence is adjacent to the N region of the H3 αN in the canonical nucleosome (S. cerevisiae; 1ID3), this region in the CENP-A nucleosome (H. sapiens; 3AN2) is disordered, suggesting that a sharp structural boundary exists between H3-A47 and -L48. Indeed, CENP-A nucleosomes wrap less DNA than canonical H3 nucleosomes because their αN helices are looser (Hasson et al., 2013). The accession numbers for the CENP-A N-tail regions used are as follows: S. cerevisiae H3 (UniProtKB P61830), G. lamblia CENP-A (EDO81729), Naegleria gruberi CENP-A (EFC49963.1), T. vaginalis CENP-A (TVAG_224460), A. castellanii CENP-A (XP_004344015.1), A. thaliana CENP-A (NP_001030927.1), S. cerevisiae CENP-A (UniProtKB P36012) and H. sapiens CENP-A (UniProtKB P49450). (B) (a) Splitting H4-H3. Keeping H3 αN function during H4-H3 splitting should require interaction between the H3 αN and the H2A' docking domain. (b) An unstable H3 αN is just like the extant CENP-A nucleosome. (c) Possible stabilization of the H3 αN. If a DNA tethering sequence exists adjacent to the N region of H3 αN, a tethering sequence could stabilize H3 αN structure and function. (C) Necessity of precise recombination. Histone genes of extant giant viruses are predicted to be relics from a variety of pre-existing LECAs (Irwin and Richards, 2024) (orange hexagons with arrows derived from the pre-existing LECA [large blue arrow toward LECA]). Percentages denote conservation of 11 H3 residues in Supplementary Figure S1 compared with S. cerevisiae. Since a common H3 sequence is found in a variety of giant viruses (Fig. 4), it seems likely that this common H3 sequence was present in some pre-existing giant viruses and always precisely recombined with the newly split H3 N-end in pre-LECA lineages. Since this recombination could elongate one α-helix turn of the H3 αN and bring a DNA tethering sequence to the H3 N-tail (right purple box), these high structural and functional benefits provide a good explanation for the occurrence of this precise recombination again and again during nucleosome evolution.

Although conservation of the 11 H3 residues (Supplementary Fig. S1) is poor and less than perfect in Medusavirus (55%) and Clandestinovirus (91%), respectively, there appears to be good conservation between the H3 N-terminal sequences of these viruses and those of budding yeast (Fig. 4). One explanation for this is that the common evolutionarily beneficial H3 N-terminal sequence always recombined into an emerging free H3 following splitting of the H4-H3 doublet (Fig. 5C). Since co-evolution of pre-LECA cells and giant viruses by frequent horizontal transfer of histone genes during eukaryogenesis is predicted (Irwin and Richards, 2024), we imagine that homologous recombination occurred between histone genes of pre-LECA lineages and those of giant viruses (Fig. 5C). Such recombination enables a one-turn extension of the α-helix of the H3 αN and tethering of the H3 αN onto nucleosomal DNA (Fig. 5C, right purple box). Alongside their structural benefits, residues H39–A47 of histone H3 are highly functional in budding yeast (Fig. 4, Supplementary Figs. S3A(a)–(c) and S3B [H3 sequence]) (Sakamoto et al., 2009). Thus, we posit that a strong Darwinian driving force would have allowed for horizontal recombination of the same H3 N-terminus repeatedly during nucleosome evolution.

The secondary histone doublet

Primary H4-H3 doublet splitting (Fig. 6A(a)(b)) could have occurred and been followed by the first recombination event, leading to a H3 N-terminus identical to those found in Medusavirus and Clandestinovirus (Fig. 6A(c)). A secondary recombination event would have created the Medusavirus stheno H3-H4 doublet (Fig. 6B(a)). Similarly, a secondary recombination event could have given rise to a H4-H3 doublet (Fig. 6B(b)), a H2B-H2A-H3 triplet (Fig. 6B(c)) and a H2B-H2A-H3-H4 quartet (Fig. 6B(d)), while maintaining the ‘budding yeast H3 K37-P38-H39-R40-Y41-K42-P43-G44-T45-V46-A47’-like sequences in the H3 of all oligomers (Fig. 6B). Thus, the evolutionary conservation demonstrated in Figure 6B(a)–(d) provides good evidence for the predicted structural and functional benefits of such a H3 sequence (Fig. 5C).

Fig. 6. Secondary recombination. (A) Primary and secondary recombination during histone evolution. (a)(b) Splitting of the H4-H3 doublet. As shown in Figure 1C, the H4-H3 doublet found in Marseillevirus and Melbournevirus cannot be split, due to disordering of the H3 αN. On the other hand, the H3 of the H4-H3 doublet in Wiseana iridescent virus has three H3 αN residues necessary for interaction between the H3 αN and the H2A' docking domain (Fig. 4). Thus, when the H4-H3 doublet in Wiseana iridescent virus was split, H3 αN function would have been preserved. If the linker region between H4 and H3 in the H4-H3 doublet has DNA tethering ability, such tethering would stabilize H3 αN. (c) Since the LECA (its descendant; S. cerevisiae), Medusavirus and Clandestinovirus all have almost identical H3 sequence adjacent to the N region of H3 αN (Fig. 4), this first recombination may have occurred independently but repeatedly in each lineage (Fig. 5C). (B) (a) The H3-H4 doublet found in Medusavirus stheno. During H3 and H4 fusion, a secondary H4-H3 doublet sacrificed free H4 N-tails once established in Medusavirus. (b) The secondary H4-H3 doublet in Loki’s Castle. Upon H4 and H3 fusion, the H3 sequence adjacent to the N region of H3 αN is kept in the linker region, suggesting that this common H3 sequence is highly beneficial for both structure and function during histone evolution. (c) Secondary H2B-H2A-H3 triplet. (d) Secondary H2B-H2A-H3-H4 quartet. Similar secondary fusion products (a)–(d) always keep the common adjacent N region of H3 αN, just like Figure 6A(c). (C) The H2A-H2B doublet. (a) The H2B-H2A doublet, as the primary molecule, is found in a variety of giant viruses. (b) Few giant viruses have free H2A and free H2B. (c) The H2A-H2B doublet is found in Indivirus and Klosneuvirus. Since the H2B-H2A doublet has superior stability compared with H2A-H2B (Fig. 2C), the H2A-H2B doublet seems to be a secondary doublet fusing free H2A and free H2B.(D) Histone N-tails. Any histone doublet, triplet and quartet (Figs. 6B and 6C) always sacrifices at least one N-tail of H2A, H2B, H3 or H4, suggesting that histone tails are not essential for structure or function of giant viruses. Thus, the histone N-tails of pre-LECA lineage and Medusavirus would not be predicted to be essential for nucleosome structure or function in these organisms. (E) N-tail deletion mutants. The structure of human nucleosomes lacking the N-tail of H2A (3W96), H2B (3W97), H3 (3W98) or H4 (3W99) is essentially the same as that of the wild-type nucleosome. Budding yeast histone N-tail deletion mutants are viable. Screening results for comprehensive histone point mutants in budding yeast also indicated no significant biological role for histone N-tails (Supplementary Figs. S3B and S3C).

Since we have defined the H2B-H2A doublet as the primary state of H2A and H2B molecules (Figs. 1 and 2), we postulate that the H2A and H2B of LECA and Medusavirus were derived from splitting of the H2B-H2A doublet (Figs. 6C(a)(b)). It would therefore follow that the H2A-H2B doublet found in two giant viruses (Fig. 6C(c)) should be a secondary recombination product derived from the two free H2A and H2B molecules. During a second recombination event (Figs. 6B and 6C), the N-tails of H2A, H2B, H3 and H4 may always have been sacrificed, suggesting that each histone tail was not structurally or functionally essential for the ancient nucleosome (Fig. 6D). Marseillevirus nucleosomes, which have only N-tails of H2B and H4 (see Fig. 7A(a)), are tightly packed with no phasing over genes (Bryson et al., 2022), implying that the primary role of viral nucleosomes is tight packaging of chromatin rather than transcriptional regulation via histone modifications.

Fig. 7. Histone variants. (A) (a)(b)(c)(d) The availability of free histone N-tails and possible emergence of histone variants in a variety of giant virus nucleosomes. Among them, all N-tails and the potential for all histone variants are only available in Medusavirus. (B) (a) Distribution of histone variants across eukaryotes. Open circle and cross represent presence and absence of corresponding histone variants. H2A.Z or CENP-A is H2A or H3 variant, respectively. (b) H2A.Z in LECA. We expected that the LECA had H2A.Z. (C) (a) Alignment of H4 region containing H4-Y98. (b) Alignment of H2B region containing H2B-D71. H2B-D71, H4-L97, -Y98 and -G99, all of which are required for H2A.Z chromatin binding, are specifically conserved in eukaryotes. (D) Possible steps yielding free H2A, H2B, H3 and H4 from H2B-H2A and H4-H3 doublets. (a)(b) We tentatively define steps ①–⑥. Step 1 (①): establishment of the interaction between H3 αN and the H2A' docking domain in a nucleosome containing both H2B-H2A and H4-H3 doublets. Step 2 (②): splitting of the linker region between H2B and H2A in the H2B-H2A doublet. Step 3 (③): emergence of the H2A variant H2A.Z. Step 4(④): splitting of the linker region between H4 and H3 in the H4-H3 doublet. Step 5 (⑤): the newly generated free H3 N-end is horizontally acquired to generate the common H3 N-tail (Figs. 5 and 6). Step 6 (⑥): emergence of the H3 variant CENP-A. (E) Brief history of eukaryotic histone residues around the birth of the LECA. Colors of DNA and histone chains in each nucleosome structure are described in the corresponding PDB accession. Taking into account the six steps (①–⑥) described in (D) together with conclusions drawn from Figures 16, we are able to reconstruct a possible history of eukaryotic histone residues around the birth of the LECA. H4-R36, -R39, -L90 and -Y98 (Supplementary Fig. S1 and Fig. 7C(a)) were established before step 1 (①). H3-L48, -I51, -R52, -F54, -Q55 and H4-R40 were established during step 1 (①) (Supplementary Fig. S2). After splitting of the H2B-H2A doublet (②), H2A.Z emerged (③). H2B-D71 together with H4-Y98 began to regulate H2A.Z chromatin binding (Fig. 2D and Fig. 7C(b)). The interaction between H3 αN and the H2A.Z' docking domain was directly descended from step 1 (①). After splitting of the H4-H3 doublet (④), the adjacent free C region (including H4-Y98) of H4 HF formed a β-sheet with H2A' βC. H4-L97 and -G99, neighbor residues of H4-Y98 (Fig. 7C(a)), began to regulate H2A.Z chromatin binding together with H2B-D71 and H4-Y98. During the gain of a common H3 N-tail (Figs. 5 and 6) (step 5; ⑤), both H3-R49 and -E50 on H3 αN, which face nucleosomal DNA, began to interact with DNA (Fig. 4). H4-Y72 was involved in the interaction between H2B and H4 and was accidentally fixed specifically in the eukaryote nucleosome (Supplementary Fig. S1). After step 5 (⑤), the ‘alternative candidate of LECA’ appears to be established (Fig. 7B(b)). Duplication of H3, yielding the H3 variant CENP-A (⑥), led to the evolution of the CENP-A-dependent chromosome segregation system. It is postulated that the LECA has both H2A.Z and CENP-A (Grau-Bové et al., 2022). Giardia lamblia and T. brucei are predicted to lose H2A.Z and CENP-A, respectively. On the other hand, if T. brucei is derived from the ‘alternative candidate of LECA’, it will be a direct descendant organism with a CENP-A-independent chromosome segregation system. The accession numbers used for H4 in the analysis are as follows: S. cerevisiae H4 (UniProtKB P02309), H. sapiens H4 (UniProtKB P62805), Gallus gallus H4 (UniProtKB P62801), A. castellanii H4 (XP_004353724), A. thaliana H4 (AEC08165), G. lamblia H4 (UniProtKB A8BUJ9), T. brucei H4 (UniProtKB Q57Z31), Cc_bracovirus H4 (YP_184795.1), Wiseana iridescent virus H4-H3 (YP_004732864.1), Marseillevirus H4-H3 (YP_003407137.1), Melbournevirus H4-H3 (YP_009094869.1), Clandestinovirus H4 (QYA18737.1) and Medusavirus H4 (BBI30394.1). The accession numbers for H2B used in the analysis are as follows: S. cerevisiae H2B (UniProtKB P02294), H. sapiens H2B (UniProtKB P06899), G. gallus H2B (UniProtKB P0C1H5), A. castellanii H2B (XP_004341446), A. thaliana H2B (CAA73156), G. lamblia (UniProtKB A8BI78), T. brucei H2B (UniProtKB Q389T1), Pandravirus H2B (OFAI01000004), Indivirus H2A-H2B (Talbert et al., 2022), Marseillevirus H2B-H2A (ADB04176), Melbournevirus H2B-H2A (YP_009094870.1), Clandestinovirus H2B-H2A (QYA187) and Medusavirus H2B (BBI30201.1).

Moreover, in extant organisms, deletion of the N-tail in yeast is not lethal (Fig. 6E) (Kim et al., 2012). Deletion of each N-tail in the human nucleosome does not substantially affect nucleosome structure (Fig. 6E) (Iwasaki et al., 2013). Thus, we propose that all histone N-tails were less important in pre-LECA lineages and giant viruses than they are in extant eukaryotic cells.

Histone variants in LECA

We next wanted to address the question of the origin of histone variants. Besides histone N-tails, histone doublets, triplets and quartets may have prohibited the emergence of variants of each histone (Figs. 6B and 6C). Among a variety of giant viruses, only Medusavirus has four N-tails and the potential for emergence of each histone variant (Fig. 7A). Furthermore, although the majority of extant eukaryotes have both H2A.Z (H2A variant) and CENP-A (H3 variant), primitive eukaryotes such as G. lamblia and T. brucei have only one of the two (Fig. 7B(a)) (Akiyoshi and Gull, 2014; Talbert et al., 2019; Grau-Bové et al., 2022). Phylogenetic comparison of the nucleosome domain required for H2A.Z to bind chromatin (Fig. 7C) leads us to propose that the LECA had H2A.Z (Fig. 7B(b)). We have previously demonstrated that each of the point mutants H4-L97A, -Y98A, -G-99A and H2B-D71A lacks H2A.Z on its chromatin (Kawashima et al., 2011; Nakabayashi et al., 2014, 2020; Nakabayashi and Seki, 2022). Since H4-L97, -Y98, -G-99 (Fig. 7C(a)) and H2B-D71 (Fig. 7C(b)) are perfectly conserved in G. lamblia, we conclude that LECA had H2A.Z and G. lamblia lost it; this speculation is consistent with a previous report (Grau-Bové et al., 2022).

The emergence of H2A.Z (Fig. 7B(b)) and a nucleosome domain for H2A.Z (Fig. 7C) could specify the order in which histone doublets split. We propose that the H2B-H2A doublet (Fig. 7D(a)) split first, followed by the H4-H3 doublet (Fig. 7D(b)) (Fig. 6A). Phylogenetic analyses suggest that H4-L90 and -Y98 residues appeared in pre-LECA lineages, just like in some extant giant viruses (Fig. 7C(a) and Supplementary Fig. S1). In the presence of interactions between the H3 αN, the H4 α1 and the H2A'docking domain in pre-LECA, H3-L48, -I51, -R52, -Q55 and H4-R40 (Supplementary Fig. S2) would have gained function. Splitting of H2B-H2A would have enabled the emergence of H2A.Z, after which the H4-Y98 and H2B-D71 pair could acquire structure and functionality to maintain H2A.Z on chromatin. Thus, the interaction between the H3 αN and the H2A.Z' docking domain (PDB: 1F66) may be descended directly from the original interaction between the H3 αN and the H2A' docking domain (Fig. 7E).

Splitting of the H4-H3 doublet would have provided a free H4 C-terminus containing H4-Y98, allowing both H4-L97 and -G99 residues (adjacent to H4-Y98) to gain function in maintaining H2A.Z status (Fig. 7C(a)). Independently, a newly created H3 N-terminus would have been able to acquire common H3 N-tails that resemble the extant eukaryotic one (Figs. 5, 6 and 7E). We posit that the point at which this occurred represents the point of origin of the LECA, as the ‘alternative candidate of LECA’ (Fig. 7E), within Euglenozoa, to which T. brucei belongs. Of note, the ‘postulated LECA’ proposed in many studies is outside the Euglenozoa (Fig. 7E) (Al Jewari and Baldauf, 2023).

Importantly, if the ‘alternative candidate of LECA’ is correct, the LECA would resemble T. brucei, which does not have a CENP-A-dependent chromosome segregation system. Following the emergence of CENP-A, there would have been a transition from a CENP-A-independent to a CENP-A-dependent segregation system, as found in most extant eukaryotes (Fig. 7E). Trypanosoma brucei has 27 kinetoplastid kinetochore proteins including KKT10 and KKIP7, but not CENP-A or conventional kinetochore proteins (Ndc80 and Nuf2) (Butenko et al., 2020). By contrast, most eukaryotes including yeast and human have CENP-A, Ndc80 and Nuf2, but not KKT10 or KKIP7. Euglena gracilis, belonging to Euglenozoa, has CENP-A, Ndc80, Nuf2, KKT10 and KKIP7. Since E. gracilis could represent a transition intermediate state from a CENP-A-independent to a CENP-A-dependent segregation system, we propose that splitting of the H2B-H2A doublet led to the emergence of the H2A variant H2A.Z, and that splitting of the H4-H3 doublet ultimately led to establishment of the CENP-A-dependent segregation system after the establishment of the LECA (‘alternative candidate of LECA’).

Notably, regardless of whether the ‘alternative candidate of LECA’ or ‘postulated LECA’ is correct, some species, such as G. lamblia, lost H2A.Z (Fig. 7E), consistent with a previous report (Grau-Bové et al., 2022). Although T. brucei lost CENP-A from the ‘postulated LECA’, consistent with a previous report (Grau-Bové et al., 2022), it lost nothing from the ‘alternative candidate of LECA’ (Fig. 7E).

Differences in sequences of N-tails between T. brucei and S. cerevisiae

Another key unanswered question is the reason for the variation in H3 N-tail sequences. Like S. cerevisiae H2A.Z, T. brucei H2A.Z together with a Kinetoplastea-specific histone H2B variant, H2B.V, are enriched at transcription start sites (TSSs) (Fig. 8A) (Kraus et al., 2020). As shown in Figure 8B, the amino acid sequences of canonical histone N-tails in T. brucei are quite different from those of S. cerevisiae. Specifically, T. brucei H2A.Z is more enriched in GK/GGK sequences than H2A.Z of S. cerevisiae (Fig. 8B(a) and Supplementary Fig. S4A) (see Supplementary Fig. S9). Moreover, the Kinetoplastea-specific H2B.V is also rich in GK/GGK sequences (Fig. 8B(a) and Supplementary Fig. S4B). Given that the GK/GGK sequence found in the S. cerevisiae H4 N-tail is regulated by acetylation (Grunstein and Gasser, 2013), we hypothesize that nucleosomes containing T. brucei H2A.Z and H2B.V at a TSS are hyper-acetylated. Indeed, acetylation of nucleosomes at TSSs in T. brucei has been reported (Kraus et al., 2020), providing a good explanation for prokaryotic-type transcription regulation in T. brucei (Faria, 2021).

Fig. 8. Different histone N-tails in T. brucei. (A) H2A.Z accumulates at transcriptional start sites (TSSs). As with accumulation of H2A.Z (budding yeast Htz1) at TSSs in S. cerevisiae, H2A.Z of T. brucei, together with the Kinetoplastea-specific H2B variant H2B.V, accumulate at TSSs (Supplementary Fig. S4). (B) Sequences of histone tails. Canonical and variant histone N-tails are shown. (a) T. brucei and (b) S. cerevisiae. Blue: DNA-binding residues. Pink: residues that are lethal when mutated. Red beans color: residue interacting with the H2A’ docking domain. Red lysines: modifiable Lys residues. Green: GK or GGK sequence, which is mainly regulated by acetylation/deacetylation. (C) GK/GGK numbers in a variety of histone N-tails in canonical and variant nucleosomes. The numbers of GK/GGK sequences found in each eukaryotic lineage are shown. (a) T. brucei, (b) S. cerevisiae, (c) A. castellanii and (d) A. thaliana. The sequence of each histone N-tail used to count GK/GGK numbers is shown in Supplementary Figures S4 and S7–S10. The accession numbers for histones and histone variants used in the analysis are as follows: T. brucei H2A (UniProtKB Q57YA3), T. brucei H2B (UniProtKB Q389T1), T. brucei H3 (UniProtKB Q4GYX7), T. brucei H4 (UniProtKB Q57Z31), T. brucei H2A.Z (RHW71403), T. brucei H2B.V (RHW67227), S. cerevisiae H2A (UniProtKB P04911), S. cerevisiae H2B (UniProtKB P02294), S. cerevisiae H3 (UniProtKB P61830), S. cerevisiae H4 (UniProtKB P02309) and S. cerevisiae H2A.Z (UniProtKB Q12692).

In terms of GK/GGK number, nucleosomes at TSSs have 30 GK/GGKs in T. brucei, although the canonical nucleosome has only two GK/GGKs (Fig. 8C(a)). By contrast, S. cerevisiae, as well as Acanthamoeba castellanii and Arabidopsis thaliana, commonly have 12–14 GK/GGKs in the canonical nucleosome, while their TSS nucleosomes have 14, 18 and 14 GK/GGKs, respectively (Fig. 8C(b)–(d)), reflecting that the number of GK/GGK sequences is always larger in H2A.Z than in canonical H2A (see Supplementary Fig. S9). Of note, GK/GGKs may be acetylation sites that cannot be methylated or ubiquitylated.

Next, we addressed the H3 N-tails found in a variety of eukaryotes, paying attention to GK/GGK sequences as an apparent marker of sequence variation (Fig. 9).

Fig. 9. Differentiation of the H3 N-tail during evolution. (A) Two types of reconstructed H3 N-tails in the pre-LECA, the same as Figure 4. (B) H3 N-tails found in a variety of eukaryotic lineages are mainly classified into four groups (a–d). Group c has almost the same H3 N-tail sequence as that of budding yeast and the H3 N-tail (+GGK) in the ‘postulated LECA’ in (A) (Postberg et al., 2010). The accession numbers for histone H3 used in the analysis are as follows: Bodo saltans (GenBank CUG87252.1), Strigomonas culicis (GenBank EPY33365.1), Leishmania major (UniProtKB Q4QHB5), T. brucei (UniProtKB Q4GYX7), Perkinsela sp. (GenBank KNH08162.1), E. gracilis (ELL00004092 (Postberg et a., 2010), N. gruberi (GenBank EFC36852.1), T. vaginalis (GenBank CAA66646.1), A. thaliana (GenBank AAA32809.1), A. castellanii (UniProtKB L8H5A2), S. cerevisiae (UniProtKB P61830), Stygiella incarcerata (GenBank ANM86119.1), G. lamblia (UniProtKB E2RU29), G. muris (GenBank TNJ30187.1), Trepomonas sp. (GenBank JAP92024.1), Spironucleus salmonicida (GenBank KAH0577014.1) and S. vortens (GenBank ABG76197.1). (C) The root of the LECA is uncertain. Since the root of the LECA has not been determined and is hotly debated, we tentatively use one of the latest hypotheses based on phylogenetic analysis (Jewari and Baldauf, 2023), with the ‘postulated LECA’ marked by ‘0’ in the black circle. The ‘postulated LECA’ should have H2A.Z, CENP-A and a H3 tail containing the GGK sequence (Fig. 4). Starting from the ‘0’ point, G. lamblia is the second branch in the evolutionary tree. By contrast, Discoba is the fourth branch. The group c sequence of the H3 N-tail (B) is found in a variety of eukaryotic lineages. Taking into account the eukaryotic canonical histone evolution analyzed in this study (Figs. 18), we proposed the ‘alternative candidate of LECA’ marked by ‘1’ in the green oval. The ‘alternative candidate of LECA’ should have H2A.Z but not CENP-A. Moreover, the ‘alternative candidate of LECA’ has a H3 N-tail lacking a GGK sequence (A), just like the H3 N-tails of Medusavirus, Medusavirus stheno and Clandestinovirus (Fig. 4).

Possible differentiation of extant H3 N-tails from the predicted H3 N-tail in the LECA

Since the H3 N-tails of Medusavirus, Medusavirus stheno and Clandestinovirus resemble those of S. cerevisiae, we can predict which H3 N-tail, which does not contain GGK, would be found in the LECA (‘alternative candidate of LECA’) (Fig. 4 and Fig. 9A, upper line). By contrast, using H3 N-tail sequences derived from extant eukaryotes, the H3 N-tail, which contains GGK, is reconstructed (Postberg et al., 2010) as the H3 N-tail of the LECA (‘postulated LECA’) (Fig. 4 and Fig. 9A, lower line). Notably, since lysine residues such as H3-K4, -K9, -K27 and -K36 are predicted to be conserved in both the ‘alternative candidate of LECA’ and the ‘postulated LECA’ (Fig. 9A), the LECA already had lysine methyltransferases acting on these residues, as proposed previously (Grau-Bové et al., 2022). Besides methyltransferases, a variety of histone H3 tail interactors in extant eukaryotes could have interacted with predicted ancient H3 N-tails due to high conservation between the human H3 N-tail and those in Figure 9A (Supplementary Fig. S5).

Upon alignment of H3 N-tails derived from a variety of eukaryotes, groups a–d in Figure 9B are mapped onto the phylogenetic tree of eukaryotes (Al Jewari and Baldauf, 2023) (Fig. 9C). Starting from the ‘postulated LECA’, Discoba (T. brucei and E. gracilis) could be the fourth branch of the eukaryotic tree after Parabasalia (Trichomonas vaginalis), Fornicata (G. lamblia) and Preaxostyla (Fig. 9C) (Al Jewari and Baldauf, 2023).

Remarkably, although E. gracilis (Fig. 9B group c) belongs to Euglenozoa and T. vaginalis (Fig. 9B group c) belongs to Parabasalia, their N-tails containing GGK are almost identical to that of S. cerevisiae (Figs. 9A and 9B group c) (Postberg et al., 2010). Conversely, T. brucei (Fig. 9B group a), which belongs to Euglenozoa, and G. lamblia (Fig. 9B group d), which belongs to Fornicata, have very different H3 N-tails lacking GGK compared with S. cerevisiae (Fig. 9B group c).

Starting from the ‘postulated LECA’, which has H2A.Z, CENP-A and a H3 N-tail containing GGK, G. lamblia lost both H2A.Z and the GGK sequence in the H3 N-tail. Trypanosoma brucei lost both CENP-A and the GGK sequence in the H3 N-tail. By contrast, starting from the ‘alternative candidate of LECA’, although G. lamblia lost H2A.Z, T. brucei lost nothing. A lack of CENP-A but not of H2A.Z is lethal in S. cerevisiae, implying that H2A.Z could be lost much more easily than CENP-A in primitive eukaryotes. Thus, we favor the ‘alternative candidate of LECA’ model; however, this conclusion is supported by very few reports (Cavalier-Smith, 2010; Akiyoshi and Gull, 2014). Notably, the LCEA (last common euglenozoan ancestor) has been reconstructed (Vesteg et al., 2019). If the ‘alternative candidate of LECA’ is the true LECA, the LECA closely resembles the LCEA (Fig. 9C).

Finally, besides the ‘alternative candidate of LECA’ and ‘postulated LECA’ in Figure 9B, many other roots of eukaryotes, as the LECA, have been predicted (Hampl, et al., 2009; Derelle et al., 2015; Gabaldón, 2021). In future work, any LECA models should appropriately explain the evolutionary history of eukaryotic canonical and variant nucleosomes using the information presented in this study.

DISCUSSION

This study used 26 conserved histone residues as an evolutionary searchlight to elucidate the pathway via which the archaeal nucleosome evolved into the extant eukaryotic nucleosome (Fig. 2D and Supplementary Figs. S1, S6A and S6B).

Among the 26 histone residues, eight residues were already present in the archaeal nucleosome: H3-E97, -H113, -R116, -T118, -D123; H4-R36, -K39 (corresponding to yeast R39) and -R45 (Supplementary Figs. S1 and S6B(a)). We propose that following the establishment of a nucleosome containing two H2B-H2A doublets and two H4-H3 doublets (Figs. 7E and 9C and Supplementary Fig. S6B(b)), 16 histone residues (H2A-Y58, -E62, -L66, -D91, -E93, -L94; H2B-L109; H3-L48, -I51, -F54, -Q55, -L62, -L130; H4-R40, -L90 and -Y98) in pre-LECA lineage cells co-evolved with those of giant viruses (Supplementary Fig. S6B(c)). Thus, nucleosomes of Marseillevirus and Melbournevirus (Supplementary Fig. S6B(d)), in which no interaction occurs between H3 αN and the H2A' docking domain (Fig. 3A(b)), may be directly descended from the first nucleosome (Supplementary Fig. S6B(b)).

During parallel evolution of nucleosomes in a variety of pre-LECA lineages, some nucleosomes occasionally acquired interaction between the H3 αN and the H2A'docking domain (Fig. 7E ①), allowing separation of the H4-H3 doublet (Figs. 3, 5 and 7). In our model, the splitting of the H2B-H2A doublet rather than the H4-H3 doublet first occurred in a eukaryotic lineage nucleosome (Fig. 7E and Supplementary Fig. S6B(e)). Soon after, H2A.Z emerged and the interaction of H2B-D71 with H4-Y98 was fixed on the H2B sequence to allow stabilization of H2A.Z on chromatin.

Next, the splitting of the H4-H3 doublet occurred. After that, the newly formed H3 N-terminus acquired the common H3 N-tail (Fig. 7E and Supplementary Fig. S6B(f)), establishing the LECA (‘alternative candidate of LECA’). At that time, one residue that is lethal when mutated, H4-Y72, and is involved in the interaction between H4 and H2B at their four-helix bundle, became accidentally fixed on the eukaryotic lineage H4 sequence (Supplementary Fig. S6B(f)), establishing the accumulation of all 26 conserved residues listed in Fig. 2 and Supplementary Figs. S1 and S6A. Further, duplication of H3 gave rise to the H3 variant CENP-A (Figs. 7E and 9C and Supplementary Fig. S6B(g)), and, consequently, the CENP-A-dependent chromosome segregation system could be established (Figs. 7E and 9C).

Finally, since in our study we used only 26 histone residues (Fig. 2D and Supplementary Fig. S1) out of a total of 170 functional residues (Supplementary Figs. S3B, S3C, and S6A) to successfully gain insights into the evolution of eukaryotic histones, future structural and phylogenetic approaches that examine all 170 histone residues will allow for a more precise analysis of the evolutionary path of the eukaryotic nucleosome at single amino acid resolution.

CONFLICTS OF INTEREST

The authors declare that they have no known competing financial interests or personal relationships that could appear to influence the work reported in this paper.

ACKNOWLEDGMENTS

We thank Dr. M. Horikoshi (Jobu University) for the initial analyses of the histone point mutant library and the H2B-H2A doublet. We thank Ms. M. Seki for illustrating some of the figures. This study was financially supported by the Tohoku Medical and Pharmaceutical University, whose founding spirit is “We will open the gate of truth”; thus, we wish to extend our gratitude to Dr. M. Takayanagi, president of the University.

REFERENCES
 
© 2025 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top