Asynchronous evolution of centromeric sequences across chromosomes in Pyricularia oryzae

Atsumi Morimoto; Thach An Dang; Ken-ichi Ikeda; Hitoshi Nakayashiki

doi:10.1266/ggs.24-00208

ABSTRACT

Centromeres are essential for chromosome segregation, yet they are among the most rapidly evolving regions of the genome. The mechanisms driving this rapid evolution of centromeric sequences are still not well understood. In this study, we identified the centromeric sequences of the wheat-infecting fungus Pyricularia oryzae (strain Br48) using CENP-A chromatin immunoprecipitation followed by high-throughput sequencing. The Br48 centromeres range from 71 kb to 101 kb in length and are highly AT-rich (72.1–75.5%) and repeat-rich (63.4–85.0%). These regions are also enriched for H3K9me3 and 5-methylcytosine but depleted of H3K4me2 and H3K27me3. During the analysis of repetitive sequences in the Br48 centromere, we identified a stretch of approximately 530 bp that is tightly associated with centromeres in P. oryzae. We named this element the CenIR (centromere-associated IR element), as it often forms inverted repeat structures with two elements adjacent in reverse orientation. A comparison of putative centromere sequences across phylogenetically distinct P. oryzae strains suggests that changes in centromeric sequences are non-uniform across chromosomes and do not always align with the fungal phylogenetic relationships. Repeat-induced point mutation (RIP)-like C:G to T:A transitions likely accelerate base substitutions in the centromeres of Pyricularia fungi.

INTRODUCTION

During cell division, DNA condenses into chromosomes and is segregated to opposite poles in anaphase, resulting in the equal distribution of genetic material (Perea-Resa and Blower, 2018; Talbert and Henikoff, 2020). Microtubules attach to a specialized DNA region known as the centromere, facilitating this process. The centromere is essential for faithful chromosome segregation in all eukaryotes. Despite their conserved function, centromeric DNA sequences exhibit remarkable diversity across species and even within the same species. This paradox of functional conservation and sequence diversity is a fundamental enigma in centromere biology (Malik and Henikoff, 2002).

In contrast, the proteins associated with the centromere are highly conserved across eukaryotes. In particular, CENP-A, a variant of histone H3, plays a critical role in recruiting other centromeric proteins for the formation and maintenance of centromeric chromatin. Orthologs of CENP-A, referred to as CenH3 or Cse4 in certain species, are widely conserved, albeit with sequence variation, across animals, plants and fungi, underscoring their universal importance in centromere function (Quénet and Dalal, 2012; Smith et al., 2012). Centromeric regions exhibit heterochromatic properties, marked by epigenetic modifications such as H3K9me3 and cytosine methylation. Thus, it is believed that the centromere region is defined epigenetically by these modifications and the histone variant (Wong et al., 2020). Centromeres are also characterized by the accumulation of repetitive sequences represented by transposable elements (TEs). Emerging evidence suggests that these sequences contribute to the establishment of functional centromeres (Hartley and O’Neill, 2019).

Pyricularia oryzae (syn. Magnaporthe oryzae) is a phytopathogenic fungus responsible for blast diseases in various gramineous plants, including economically important crops such as rice and wheat (Kato et al., 2000). Host-specific P. oryzae pathotypes are genetically distinguishable, and thus form phylogenetically distinct subgroups (Gladieux et al., 2018; Asuke et al., 2023). The centromere sequences of a rice-infecting strain of P. oryzae were previously determined (Yadav et al., 2019). However, the centromere dynamics among the distinct subgroups of P. oryzae remain to be examined in detail. Here, we identified the centromeric sequences of a wheat-infecting P. oryzae strain, Br48, using CENP-A chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq). We also investigated centromere sequence variation, with a focus on repetitive sequences and centromere diversification across distinct P. oryzae strains.

RESULTS AND DISCUSSION

Short reads from CENP-A ChIP-seq analysis were mapped to the Br48 genome, along with reads from our previous ChIP-seq analyses of H3K4me2, H3K9me3 and H3K27me3, as well as methylated DNA IP sequencing (MeDIP-seq) (Pham et al., 2015; Van Vu et al., 2021; Kobayashi et al., 2023). The mapping data clearly identified one centromere on each chromosome (Supplementary Fig. S1), with co-localized peaks of CENP-A, H3K9me3 and 5-methylcytosine (5mC), along with a depletion of H3K4me2 and H3K27me3 (Fig. 1). These heterochromatic features are typical of centromeres in diverse organisms, often extending to the flanking pericentromeric regions. However, in P. oryzae, the heterochromatic features are only observed in the CENP-A mapped regions. The Br48 centromeres range from 71 kb to 101 kb in length and are highly AT-rich (72.1% to 75.5%) (Table 1). The chromosomal positions of the Br48 centromeres are mostly consistent with those predicted in the wheat-infecting strain B71, with slight differences in length on chromosomes 3 and 7 (Yadav et al., 2019).

Fig. 1. Identification of centromeres in a wheat-infecting strain (Br48) of Pyricularia oryzae. Reads obtained from CENP-A ChIP-seq analysis identified one distinct enriched region on each of the seven Br48 chromosomes. The centromeric regions were determined using the HOMER “findPeaks” command with the parameters “-size 500” and “-minDist 10000” (Heinz et al., 2010). Alongside the CENP-A ChIP-seq data, GC content and mapping data from MeDIP-seq (5mC) and ChIP-seq analyses of three major histone modifications (H3K4me2, H3K9me3 and H3K27me3) are presented. The Y-axis scales are set to 0–150 RPM for all ChIP-seq data, 0–350 RPM for MeDIP-seq data, and 0–100% for GC content. The numbers in parentheses indicate the genomic positions of the regions shown in the figure based on the Br48 genome assembly (AP027063–AP027069).

Table 1. Centromeres in the Pyricularia oryzae Br48 genome

Chromosome^a	Position^a	Length	GC %	Repeats %^b
Ch1	5,216,781–5,317,708	100,928	27.3	74.8
Ch2	393,038–480,645	87,608	27.9	64.0
Ch3	6,391,188–6,462,491	71,304	25.5	79.1
Ch4	816,104–894,410	78,307	24.5	77.8
Ch5	295,348–366,353	71,006	25.0	63.4
Ch6	5,878,476–5,949,284	70,809	24.7	73.7
Ch7	3,259,602–3,354,300	94,699	26.5	85.0

^a Chromosome and nucleotide numbers are presented according to the Br48 genome assembly (AP027063–AP027069).

^b Percentage of sequences homologous to Br48 repeat sequences as listed in Supplementary Table S1.

A BLAST search using a set of repeat sequences from the Br48 genome (Supplementary Table S1) identified abundant TEs and unknown repeats within the Br48 centromeres, in contrast to the relatively few TEs detected in P. oryzae centromeres by Yadav et al. (2019). The discrepancy is likely explained by the restriction to repeat elements documented at the time of their analysis. The repeat content in the Br48 centromeres totaled 73.6%, but varied moderately across chromosomes, ranging from 63.4% on chromosome 5 to 85.0% on chromosome 7 (Table 1). The largest group of repeats consists of LTR retrotransposons, which comprise 64.3% of the centromeric sequences in Br48. The top three elements are RETRO5 (Farman et al., 2002), Br48_LTR_Retro2 and Br48_LTR_Retro5, accounting for 17.3%, 15.2% and 11.7% of the centromeric sequences, respectively (Fig. 2).

Fig. 2. Repeat content and organization of the Br48 centromeres. Repeat sequences identified in the Br48 genome were mapped to the Br48 centromeres. Each repeat is shown in a different color as illustrated below the centromeres. Repeat content percentages are provided in Table 1.

To assess the preference of each element for centromeres, we calculated the fold difference between its actual occurrence in centromeres and the expected value based on the whole genome sequence. Among the 42 elements analyzed, 27 were absent in centromeres, while 12 were overrepresented by approximately 10-fold or more (Fig. 3A), highlighting a broad diversity in centromere preference among the elements. Notably, CenIR (centromere-associated inverted repeat) and Br48_LTR_Retro5 elements exhibited a remarkable preference for centromeres, being overrepresented by more than 50-fold.

Fig. 3. Centromeric preference of a subset of repeats in P. oryzae. (A) Fold enrichment of repeat elements in centromeres relative to their expected values based on the whole genome sequence. Fold enrichment was calculated using the total nucleotide count showing sequence similarity to each element, as identified by a BLAST search. The graph includes only repeat elements detected in centromeres. (B) Distribution pattern of centromere-associated inverted repeat (CenIR) elements across Br48 chromosomes. Arrows indicate the orientation of CenIRs, with their sizes roughly proportional to the relative sizes of CenIR copies but not to the chromosomes. Gray regions on the chromosomes mark the centromere positions. Due to the enlarged view, arrow positions do not accurately reflect the genomic locations of the elements.

Br48_LTR_Retro5 is an LTR retrotransposon belonging to the GYPSY superfamily, sharing sequence similarity with Grasshopper, which was originally identified in finger millet strains of P. oryzae (Dobinson et al., 1993). In Arabidopsis, LTR pairs of ATHILA retrotransposons exhibit higher sequence similarity within centromeres compared to those outside, suggesting that centromeric ATHILA copies represent relatively recent insertions (Naish et al., 2021). To determine whether a similar pattern occurs in P. oryzae, sequence identities of LTR pairs were analyzed for Br48_LTR_Retro5 and six other LTR retrotransposons located both inside and outside centromeres (Supplementary Fig. S2). Overall, the sequence identities of LTR pairs tended to be lower for copies inside centromeres than for those outside, across all examined elements, suggesting that LTR retrotransposons in centromeres are relatively old in P. oryzae, contrary to ATHILA in Arabidopsis.

The CenIR is a non-coding and highly AT-rich (79.6% on average) sequence approximately 530 bp in length. This element, including fragmented copies, is present in every Br48 centromere, often forming inverted repeat structures with two copies arranged in reverse orientation (Fig. 3B). Phylogenetic analysis revealed two major subclasses of CenIR elements in P. oryzae (Supplementary Fig. S3). Interestingly, in all analyzed IR pairs in Br48, one element from each subclass pairs to form an IR structure. This suggests that the IR structure represents a functional unit of the CenIR with slightly divergent sequences between the inverted repeats. However, it remains unclear whether CenIRs contribute to the function or formation of centromeres, as they form IR structures at centromeres only on some chromosomes in Br48, while also forming similar structures in non-centromeric regions (Fig. 3B).

We next performed a comparison of possible centromere sequences across phylogenetically distinct P. oryzae strains including the rice-infecting strain Guy11, the finger millet-infecting strain MZ5-1-6, the perennial ryegrass-infecting strain LpKY97 and two wheat-infecting strains, B71 and Br48. A previous study revealed that wheat-infecting strains arose from perennial ryegrass strains (Inoue et al., 2017), and thus are phylogenetically closest to these strains followed by finger millet strains, and then rice strains (Gladieux et al., 2018; Asuke et al., 2023). Figure 4 shows a comparison of the centromere sequences among P. oryzae isolates, with closed circles indicating the locations of CenIRs including truncated elements. With the exception of chromosome 6 in LpKY97, CenIRs are present in all chromosomes in the P. oryzae strains. Similar to the distribution pattern in the Br48 genome, approximately two-thirds of the CenIR copies were detected within a centromeric region including possible pericentromeres in all strains used here.

Fig. 4. Synteny analysis of centromeres and their flanking regions among phylogenetically distinct P. oryzae isolates. The analysis includes five P. oryzae strains: Br48 and B71 (wheat isolates), LpKY97 (perennial ryegrass isolate), MZ5-1-6 (finger millet isolate) and Guy11 (rice isolate). Centromeres in LpKY97, MZ5-1-6 and Guy11 were predicted using the method described by Yadav et al. (2019). Approximately 25 kb of upstream and downstream flanking sequences were analyzed. The sequence order reflects phylogenetic proximity to Br48, except for chromosome 6*, where the order was adjusted to align Guy11 and Br48 for comparison (see text). The color gradients associated with these comparisons represent the percent sequence identity levels in BLAST alignments, with red and blue indicating regions whose sequences have the same and the opposite orientation, respectively. Closed circles mark the genomic locations of CenIR elements, with colors indicating their copy number.

Sequence analysis revealed that structural changes in the centromeres are often caused by insertions of TEs, as previously reported (Yadav et al., 2019), but in some cases by more complex rearrangements. Overall, centromere structural variation tended to be more extensive in more phylogenetically distant fungal strains. However, this is not always the case when focusing on individual chromosomes. For instance, the centromere structure on chromosome 3 differs markedly even between two wheat strains, Br48 and B71, while it is well conserved between B71 and LpKY97, as well as between B71 and MZ5-1-6. Similarly, the centromere structure on chromosome 6 is relatively well conserved between wheat strains and the rice isolate Guy11, whereas more closely related strains, LpKY97 and especially MZ5-1-6, show notably distinct centromere structures (Chromosome 6* in Fig. 4).

To further investigate the evolution of centromere sequences in P. oryzae, base substitution rates were analyzed in the centromeres and their flanking regions, extending approximately 25 kb on either side. Using the B71 strain sequences as a reference, base substitutions in BLAST alignments were counted, excluding gaps from the analysis. Since only sequences syntenic to the B71 sequences and oriented in the same direction were subjected to the analysis, the centromeres of chromosome 5 in Guy11 and chromosome 6 in MZ5-1-6 were excluded due to insufficient syntenic sequences. In general, base substitution rates in the centromeres were over ten times higher than in the flanking regions (Fig. 5), and rates in both centromeres and flanking regions tended to increase in more phylogenetically distant fungal strains. However, similar to structural variation, these rates varied considerably across chromosomes within each strain, suggesting asynchronous evolutionary rates across chromosomes.

Fig. 5. Base substitution rates across centromeres and their flanking regions among P. oryzae isolates. A BLAST search was performed with default parameter settings against the genome sequences of Br48 (wheat isolate), LpKY97 (perennial ryegrass isolate), MZ5-1-6 (finger millet isolate) and Guy11 (rice isolate) using the centromere sequences and their flanking regions of B71 as queries. Base substitutions in the resulting BLAST alignments were counted, excluding gaps from the analysis. Substitution rates were calculated by dividing the number of base substitutions by the total base numbers of the aligned sequences excluding gaps. Data for Br48, LpKY97, MZ5-1-6 and Guy11 are represented in yellow, orange, pink and purple, respectively. Fractions with slanted lines, grid lines and dotted patterns in the bar graphs represent T:A to C:G transitions (relative to B71), C:G to T:A transitions (relative to B71) and other substitutions, respectively. ND, not determined due to the insufficiency of aligned sequences from the BLAST analysis.

In all strain combinations, the vast majority of base substitutions were C:G to T:A and T:A to C:G transitions (Fig. 5). Between B71 and MZ5-1-6, the rates of C:G to T:A and T:A to C:G transitions reached 73.1% in the centromeres and 61.1% in the flanking regions, suggesting the operation of repeat-induced point mutation (RIP). RIP, originally identified in the fungus Neurospora crassa, refers to extensive, directional C:G to T:A transitions in duplicated sequences during a specific stage of the sexual cycle (Cambareri et al., 1989). In P. oryzae, RIP is an active process at least in some strains including Br48 (Ikeda et al., 2002). Thus, the operation of RIP in both compared strains could explain the prevalence of C:G to T:A and T:A to C:G transitions. However, the ratio of C:G to T:A transitions to T:A to C:G transitions varies considerably across chromosomes in all examined strains (Supplementary Fig. S4). Since RIP is believed to act on duplicated sequences throughout the genome, these asynchronous RIP-like transitions across chromosomes appear to contradict this simple model. At least some of the C:G to T:A transitions may be due to spontaneous deamination of 5-methylcytosine, which forms thymine, as reported in many organisms (Fryxell and Zuckerkandl, 2000).

The apparent punctuated mutations in P. oryzae centromeres might be explained by frequent exchanges of centromeres between distantly or closely related P. oryzae strains through mating or parasexual recombination, although the sexual stage of this fungus is only observed under artificial conditions. Nevertheless, the Br48 strain is a potential hybrid between a wheat-infecting strain and a Brachiaria-infecting strain (Inoue et al., 2017; Kobayashi et al., 2023). The centromeres on chromosomes 3 and 7, which differ substantially from those in another wheat strain, B71, are likely derived from the Brachiaria strain (Kobayashi et al., 2023). A similar scenario might apply to the centromere on chromosome 6 in LpKY97. However, in chromosome 7 of B71, higher levels of base substitutions were detected in the centromere but not in either of the flanking regions, which does not align well with the above recombination model. Thus, it is tempting to assume that a RIP-like mechanism locally accelerates mutations in P. oryzae centromeres, potentially triggered by genomic rearrangements and/or TE insertions, as the observed base substitution rates seem to correlate with the extent of structural changes (Figs. 4 and 5).

MATERIALS AND METHODS

Chromatin immunoprecipitation-seq analysis

A ChIP assay was conducted using the ChIP-IT Express kit (Active Motif, USA) according to the manufacturer’s protocol as previously described (Kobayashi et al., 2023). Fungal mycelia were cultured in CM liquid medium for 4 days at 26 °C on an orbital shaker set to 120 rpm. A 10-mg portion of mycelia was harvested and incubated at room temperature for 15 min in 10 ml of phosphate-buffered saline containing 1% formaldehyde (w/v). Chromatin was sheared by sonication using a Bioruptor apparatus (Cosmo Bio, Japan) for three cycles of 1 min on at high intensity (200 W) and 30 s off, followed by five cycles of 1 min on at medium intensity (160 W) and 30 s off. The sheared chromatin fragments ranged from approximately 100 to 500 bp, as determined by agarose gel electrophoresis. Antibodies against CENP-A-like protein in Br48 (ortholog of MGG_06445) were raised against an N-terminal synthetic peptide (MPPQKVKKAGAKKTV) fused with KLM carrier protein. ChIP DNA was recovered using phenol–chloroform extraction and ethanol precipitation. Libraries for high-throughput sequencing were prepared using kits provided by the manufacturer of the sequencing platform and sequenced on the HiSeq X platform at Macrogen Japan.

Data analysis

Short reads obtained from ChIP-seq and MeDIP-seq analysis were mapped to the Br48 genome using CLC Genomics Workbench ver. 11.0.1. Centromeres were detected using CENP-A-ChIP reads and the HOMER “findPeaks” command with the parameters “-size 500” and “-minDist 10000” (Heinz et al., 2010). BLAST searches were performed with an e-value cut-off of 1e-20. To visualize syntenic regions in centromeres across P. oryzae strains, Easyfig software (Sullivan et al., 2011) was used with the parameters “Min. length = 2000” and “Max e Value = 0.001”.

CONFLICTS OF INTEREST

The authors declare no conflicts of interest.

DATA AVAILABILITY

The ChIP-seq data from this study have been deposited in the DDBJ Sequence Read Archive under the accession numbers PRJDB15953 and PRJDB20151.

ACKNOWLEDGMENTS

This work was supported by a Grant-in-Aid for Scientific Research (B) from the Japan Society for the Promotion of Science (#24K01758).

REFERENCES

Corresponding author

Register with J-STAGE for free!