Gene duplication , silencing and expression alteration govern the molecular evolution of PRC 2 genes in plants

PRC2 genes were analyzed for their number of gene duplications, dN/dS ratios and expression patterns among Brassicaceae and Gramineae species. Although both amino acid sequences and copy number of the PRC2 genes were generally well conserved in both Brassicaceae and Gramineae species, we observed that some rapidly evolving genes experienced duplications and expression pattern changes. After multiple duplication events, all but one or two of the duplicated copies tend to be silenced. Silenced copies were reactivated in the endosperm and showed ectopic expression in developing seeds. The results indicated that rapid evolution of some PRC2 genes is initially caused by a relaxation of selective constraint following the gene duplication events. Several loci could become maternally expressed imprinted genes and acquired functional roles in the endosperm.


INTRODUCTION
Epigenetic regulation is important in the developmental process and the response to environmental change in diverse organisms (Holec and Berger, 2012).Epigenetic regulation mainly consists of the modification of chromatin status through both DNA methylation and histone modification.Polycomb repressive complex (PRC) is one of the major components of the histone methylation machinery.There are two types of PRC, PRC1 and PRC2.PRC2 group genes were first identified in Drosophila (Lewis, 1978) and subsequently found in mammals and plants (van der Lugt et al., 1994;Goodrich et al., 1997).PRC2 contains four core proteins in Drosophila: Enhancer of Zeste (En(z)), Supressor of Zeste 12 (Su(z)12), Extra Sex Combs (ESC) and p55 (Reyes and Grossniklaus, 2003).PRC2 mediates histone H3 lysine 27 methylation, leading to transcriptionally inactive chromatin structures in target genes.
These multiple complexes evolved by gene duplications and neo-functionalization of the duplicated components (Danilevskaya et al., 2003;Chanvivattana et al., 2004;Spillane et al., 2007;Haun et al., 2007).A well-known example is adaptive evolution of the MEA gene after Edited by Koji Murai * Corresponding author.E-mail: akiraka@cc.kyoto-su.ac.jpDOI: http://doi.org/10.1266/ggs.15-00055duplication of CLF/SWN.Conservation of original gene function in one of the duplicated genes allows the other to acquire a new functional role to regulate different developmental stages (Spillane et al., 2007).The age of gene duplication and orthology among genes from different species are still not completely solved.For example, Spillane et al. (2007) reported a duplicated origin of MEA from SWN, but Luo et al. (2009) reported a more ancient duplication of MEA from the ancestor of SWN and CLF.The differences and ambiguity may arise from the use of different data sets and from the inclusion of far distant species in Luo et al. (2009).In addition to the intragenomic variety of PRC2 complexes, intergenomic comparisons between species showed that the repertoire of genes in each complex was different, possibly contributing to the diversification of the epigenetic framework of gene regulation during the evolution of higher plant species.However, evolutionary analyses of PRC2 complex genes have been performed only in a restricted number of species and genera.In Arabidopsis, MEA was analyzed in detail to show the importance of natural selection acting on promoter regions (Kawabe et al., 2007;Miyake et al., 2009).The FIE gene in Zea mays was shown to be duplicated, and the resultant copies are regulated differently (Danilevskaya et al., 2003).
Recent developments in genome sequencing projects using next-generation sequencing technology allow us to assess evolutionary processes of full sets of genes of interest for many species.In this study, we analyzed components of PRC2 in Brassicaceae and Gramineae species.Brassicaceae includes the model species A. thaliana, and whole-genome sequences of more than 10 species are available.Gramineae also contains many species whose whole-genome sequences have been obtained.By comparing two relatively divergent plant families, general mechanisms determining evolutionary processes in the PRC2 complex genes will be discussed.

Molecular evolutionary analyses Obtained
nucleotide sequences were aligned to A. thaliana or O. sativa sequences.After alignment of genomic sequences, CDS regions were extracted.Sequence alignments were checked by manual inspection to delete misaligned regions.The Raphanus sativus genome sequence is not well assembled, and thus a sequence corresponding to one locus is sometimes scattered among several scaffolds.Such genes were reconstructed by extracting partial sequences from several scaffolds and concatenating the CDS regions.Partial sequences homologous to PRC2 genes were not included in the following analyses.Aligned CDSs were used for phylogenetic analyses.Because orthology among genes from different species is still inadequately understood, sequences from Brassicaceae and Gramineae were analyzed separately.When there were closely related gene groups, phylogenetic trees were constructed including all sequences from such gene groups to determine the root of these groups.Except for Gramineae EMF2a and EMF2b, the gene groups were clearly separated from each other.As previously reported (Chen et al., 2009), Gramineae EMF2 showed unclear topology, particularly regarding the position of Oryza OsEMF2a.For Gramineae EMF2a and EMF2b, we constructed a tree with date palm sequence to determine the root and thus to separate family members (Supplementary Fig. S1).Synonymous divergences (d S ) were estimated by the Nei and Gojobori method (Nei and Gojobori, 1986) and used to make neighbor-joining trees (Saitou and Nei, 1987) by MEGA ver 5.2.2 (Tamura et al., 2011).Based on the topology of the d S trees, omega (ω = d N /d S ) values of branch-by-branch in each tree were estimated by the maximum likelihood method implemented in the program package PAML ver.4.8a (Yang, 2007).Because sequences of Oryza species always made a monophyletic clade and the total length of d S was reasonably small, we used the sequence of O. sativa as a representative of the Oryza clade.
Expression data analyses Previously, Davidson et al. (2012) performed a transcriptome analysis of reproductive and vegetative tissue of three Gramineae species, Brachypodium distachyon, O. sativa and Sorghum bicolor, using the RNA-sequencing method.These gene expression data were used in the present study.Comprehensive analysis of RNA-sequencing data from various plant tissues of Brassica rapa was performed by Tong et al. (2013); these expression data were also used here.Expression data for A. thaliana, Hordeum vulgare and Z. mays genes were obtained from the Genevestigator site (https://www.genevestigator.com/gv/)using Anatomy analysis tools.All microarray data available at Genevestigator were used, although some genes did not appear in the expression data and thus were not available.For comparison of gene expression among dif-Molecular evolution of PRC2 genes in plants ferent tissues, the level of expression is shown as a percentage of the maximum expression among all the RNA-sequencing and microarray results.

Homologs of PRC2 genes in Brassicaceae and
Gramineae species A BLAST search identified homologs of PRC2 genes in each species (Supplementary Table S2), which were then used for phylogenetic analyses (Figs. 1  and 2).The number of loci was generally similar among species, but occasionally differed (Table 1).For some components, independent occurrences of gene duplication in each species resulted in the same number of duplicated genes (e.g., MEA in B. rapa and Leavenworthia alabamica).
Some genes were not identified in several species.MEA and FIS2 could not be identified in T. hasseleriana and Aethionema arabicum, possibly because rapid evolution thwarted homology searching using A. thaliana sequence.Alternatively, because these two species are from sister families or are species branching at the basal position of Brassicaceae evolution, gene duplication between MEA/FIS2 and their homologs might have occurred after splitting of these and other species.The EMF2a gene was also not identified in some Gramineae species such as Z. mays and Setaria italica.
Due to an ancient genome triplication event (Yang et al., 1999;Town et al., 2006), B. rapa, B. oleracea and R. sativus contained multiple copies of genes (Table 1) in each A. thaliana locus group.There were also partial sequences similar to PRC2 genes in several species.These sequences were usually highly similar to active gene sequences of the same species, but were truncated or completely disrupted to become non-functional copies.The pattern of truncation and disruption suggests rapid degradation after gene duplication in each species lineage.

Evolution of En(z) group (CLF, SWN and MEA)
There are three En(z) group loci in A. thaliana (CLF, SWN and MEA) and two in O. sativa (OsSET24 and OsiEZ1).Among these, all loci except MEA in Brassicaceae were conserved in both sequence and copy number for each branch (Figs. 1 and 2).In B. rapa, B. oleracea and R. sativus, there was only one copy for CLF and SWN although these three species had experienced genome triplication.The CLF and SWN genes of these three species showed no partially similar sequences in the genome, suggesting that genes duplicated by wholegenome duplication (WGD) events have been lost completely from their genomes.MEA, known as an imprinted gene in A. thaliana, showed higher ω values of over 0.5 in some lineages, although there was no clear evidence of adaptive evolution (i.e., ω > 1).

Evolution of Su(z)12 group (FIS2, EMF2 and VRN2)
The Su(z)12 group contains three loci in A. thaliana (FIS2, EMF2 and VRN2) and two in O. sativa (OsEMF2a and OsEMF2b).Similar to the En(z) group, two loci, EMF2 and VRN2, in Brassicaceae and one locus (EMF2b) in Gramineae showed conserved evolution among all species' lineages (Figs. 1 and 2).There are several exceptions, especially in B. rapa, B. oleracea and R. sativus, which have experienced whole-genome triplication.For EMF2 and VRN2, these species have one conserved locus, whereas other loci tend to have high ω values.The FIS2 locus in Brassicaceae showed high ω values (sometimes > 1), suggesting the influence of adaptive evolution.The FIS2 locus is also known as an imprinted gene, as is the MEA gene.In Gramineae species, one locus (the EMF2a group) showed high ω values, especially in non-Oryza species.FIS2 and EMF2a both showed high ω values and a lack of genes in several species, suggesting that these genes have evolved to have specific functions or that they lack selective constraints.

Evolution of p55 and Esc groups (MSI1 and FIE)
The p55 group genes were highly conserved except in Brassica and Raphanus.B. rapa, B. oleracea and R. sativus had three copies of MSI1 homoeologous genes (Fig. 1).One of the MSI1 copies was conserved (ω < 0.2), while the other two loci showed high ω values, which suggests that these two loci are under adaptive evolution or under low selective constraint.Because of high ω values in internal branches and low ω values in some external branches, negative selection after neo-functionalization rather than loss of selective constraint could have occurred in these genes.
The FIE genes in Brassicaceae species showed a pattern similar to Brassicaceae MSI1 genes: B. rapa, B. oleracea and R. sativus have multiple copies with one highly conserved locus (Fig. 1).Other loci showed high ω values in internal branches, suggesting neo-functionalization.
The FIE gene in Gramineae species experienced independent gene duplication in several taxa (Fig. 2).O. sativa has two copies, of which one showed a slightly higher ω value than the other.B. distachyon has four copies, of which two appear to be under conserved negative selection.Z. mays and S. bicolor have two copies, possibly duplicated before the divergence of these two species.One of the duplicates of FIE was conserved but the others had higher ω values (ω ≥ 0.4).Although the reason is not clear, FIE genes in Gramineae species experienced frequent duplication events.After gene duplication, at least one copy retained its original function under negative selection.

Relation to expression patterns
The PAML analyses clearly showed that there are different patterns of  S2.S2.
conservation among genes and among species.The conservation levels are related to the expression pattern of each locus.Fig. 3 summarizes the relationship between conservation of CDSs and expression patterns.Genes with conserved evolutionary patterns generally showed a wide range of expression patterns (although SWN, SET24  S3.  (Choi et al., 2002;Gehring et al., 2006;Hsieh et al., 2009Hsieh et al., , 2011)).

Evolution of PRC2 genes in Oryza species
As well as O. sativa, there are six additional Oryza species with publicly available whole-genome sequences.The PRC2 genes were also analyzed for these seven Oryza species.All seven species have the same number of loci as O. sativa.There are no novel stop codons or frameshift mutations, suggesting highly conserved negative selection acting on these genes.Candidates for genes under adaptive evolution were not found from the PAML analyses.

DISCUSSION
High level of conservation of PRC2 genes PRC2 genes were generally well conserved in amino acid sequence and copy number among species.The average ω value is around 0.2 in plants (e.g., Lawton-Rauh et al., 1999;Tiffin and Hahn, 2002;Yang and Gaut, 2011).Some branches in phylogenetic trees of PRC2 genes, such as CLF, FIE and MSI1 of Brassicaceae or RBAP3 of Gramineae, showed ω values of less than 0.1.These genes are expressed in a broad range of tissues and known to have important regulation targets.The functional importance of PRC2 complex genes may cause highly conserved evolutionary processes among diverged species.PRC2 functions as a complex of four components, and such multi-component interactions increase selective constraint (Fraser et al., 2002).In addition, multiple target genes in multiple tissues also tend to be under high selective constraint (Duret and Mouchiroud, 2000;Wright et al., 2004;Zhang and Li, 2004;Yang and Gaut, 2011).Despite the fact that some species have experienced WGD, CLF and SWN were retained as a single gene in most species, suggesting highly conserved dosage regulation.Although both En(z)-homologous genes, i.e., CLF and SWN, were present as single-copy genes in B. rapa, B. oleracea and R. sativus, there are multiple copies of the other three PRC2 component genes.This suggests the importance of protein amount of the CLF and SWN gene products compared to other PRC2 component proteins.The methyltransferase function of the En(z) family may be regulated strictly to avoid perturbation.

Gene duplication-mediated rapid evolution
The high ω values observed in PAML analyses could be classified into three types based on the pattern of gene duplication.The first pattern of elevated ω is related to occasional gene duplications (e.g., Zmay0606 in Gramineae RBAP3; Bdis1, Bdis3 and Sbic2 in Gramineae FIE) (Figs. 1 and 2, red and blue lines).In these cases, one of the duplicated genes showed an elevated ω value, while the other retained a relatively low ω value.A similar pattern of rapid evolution was observed in genes duplicated by WGD.We discuss possible mechanisms of rapid evolution below.
The second type is gene group-wide high ω values related to ancient gene duplication events.MEA and FIS2 genes tend to show high ω values in many branches (Figs. 1 and 2, red and blue lines).Both genes are known to be imprinted and are expressed during seed developmental stages.MEA evolved adaptively after gene duplication from SWN or its ancestor (Spillane et al., 2007;Luo et al., 2009).The commonly observed high ω values in MEA and FIS2 suggest either low selective constraint or rapid evolution with recurrent adaptive changes, as proposed by the conflict hypothesis (Haig and Westoby, 1989;Kinoshita, 2007).Similar gene groupwide high ω values were observed in the Gramineae EMF2a gene.Although EMF2a is a non-imprinted gene (Luo et al., 2009) and there is no information about novel function for the EMF2a group genes, low selective constraint or high adaptive evolution could cause high ω values in this gene.In MEA and FIS2 of Brassicaceae and EMF2a of Gramineae, a gene duplication event occurred at a relatively ancient time, possibly before the establishment of each family.After this gene duplication event, the function of the original gene was retained in one of the two loci and relaxed constraint allowed the other locus either to lose its function or to acquire a partially or completely novel function.The early stage of such evolution would be similar to the first pattern described above.The third pattern observed in B. rapa, B. oleracea and R. sativus is related to WGD.These three species experienced whole-genome triplication (WGT) (Yang et al., 1999;Town et al., 2006), as a result of which they have multiple loci for each gene.Six genes in the eight analyzed gene groups are found to have more than two copies in these three species.In Brassica and Raphanus species, at least one of the duplicated genes showed low ω values, suggesting that these genes are under negative (purifying) selection, while the other copies showed high ω values (Figs., 1 and 2, red and blue lines).
In all three patterns, one locus may have retained original functions while the other duplicated copies evolved rapidly at the amino acid level.The high ω values suggest recurrent adaptive evolution or a lack of selective constraint.Although we could not distinguish whether adaptive evolution or low selective constraint caused an accelerated evolutionary rate for the PRC2 genes after gene duplication, duplicated genes sometimes acquire novel functions, such as target gene changes or different interaction patterns among components.In theoretical studies of gene duplication, the effects of duplications on molecular evolution have been investigated (reviewed in Innan and Kondrashov, 2010).All models predict an elevation of ω values (a high rate of amino acid changes).
The evolutionary fate of genes after duplication, namely pseudogenization and neofunctionalization, may be influenced by epigenetic silencing of the duplicates (Rodin and Riggs, 2003).
A model for the establishment of genomic imprinting after gene duplication Gene duplication events could have been important evolutionary forces for the PRC2 genes in plants.Brassicaceae species, especially those that experienced WGT, showed a very clear pattern, although the pattern could be found only in FIE among the Gramineae species.The rapid evolution found in this study correlated with gene duplication and subse-quent expression pattern alteration.Expression pattern changes generally led to reproductive stage-specific expression (Fig. 3).This reproductive stage-specific limited expression could be generated by gene silencing that leads to reduced dosage and reactivation by DNA glycosylases.Although it would be necessary to analyze DNA methylation patterns in each tissue to test this hypothesis, the silencing-reactivating process could cause expression patterns similar to genomic imprinting.Genomic imprinting, a phenomenon of uni-parental gene expression, is found mostly in the endosperm in plants (e.g., Gehring et al., 2009;Hsieh et al., 2011;Luo et al., 2011;Waters et al., 2011Waters et al., , 2013;;Zhang et al., 2014).Genomic imprinting is regulated by DNA and histone methylation in endosperm.The demethylation of DNA in central cells by DNA glycosylase occurs in the maternal genome but not during pollen development, causing maternally activated imprinted genes (Gehring et al., 2009;Wolff et al., 2011).The silencing of maternal alleles by histone methyltransferase (included in the FIS2-complex in A. thaliana) establishes paternally imprinted genes (Köhler et al., 2012).The situation of maternally expressed genes (MEGs) could be established by the silencing-reactivating process in the PRC2 genes.After gene duplication, gene silencing may occur by DNA methylation to adjust the excess gene dosage (Keller and Yi, 2014).During the reproductive process, maternal alleles may be demethylated in the endosperm by DNA glycosylase and thus become expressed, whereas paternal alleles remain silenced.The maternal expression and paternal silencing would thereby establish the gene expression pattern as maternally expressed imprinted genes.
In previous studies, the rapid evolution of imprinted genes was interpreted as being directly related to the imprinted status of the genes (Haig and Westoby, 1989;Haig, 2013).However, the high ω values of limitedexpression loci do not always need to be explained by adaptive evolution related to the conflict hypothesis of genomic imprinting.The elevation of ω values could be caused by either lower selective constraint or adaptive evolution including neo-functionalization.Gene silencing of one copy after gene duplication, with a functionally active copy, may cause both lower selective constraint and adaptive evolution (Innan and Kondrashov, 2010).There are high ω values of > 1 in several loci such as EMF2 and FIE of Brassica and Raphanus (Fig. 1, blue lines).These genes may have experienced adaptive evolution.We hypothesized that genes with functions fitted to endosperm development have evolved endosperm-specific functions.The process of becoming an endosperm-specific gene may be reflected as a high ω value.The PRC2 genes may have functions in the endosperm to regulate target genes through histone methylation (Makarevich et al., 2006;Bemer and Grossniklaus, 2012).If affinities Molecular evolution of PRC2 genes in plants for target genes or proportions of methylation sites differ between duplicated loci and cause a different phenotype of endosperm development, PRC2 gene loci may have become permanently imprinted genes.Although functional differentiation needs to be confirmed, accelerated evolution is frequently observed in imprinted genes after gene duplication, suggesting that duplication events can initiate expression profile changes and imprinting (Qiu et al., 2014).
We previously reported that gene duplications have important roles for rapid evolution and imprinting status in the type I MADS-box genes in Arabidopsis species (Yoshida and Kawabe, 2013).In the type I MADS-box gene family, both high ω values and multiple gene duplications were found in gene clusters with imprinted genes.However, our results suggested that imprinted status is not the main cause of either gene duplications or high ω values.Rather, the gene duplications themselves could result in the rapid evolution of duplicated genes, including imprinted genes.Moreover, we discussed the possibility that a large number of imprinted genes could be preferentially established from such duplicated gene families (Yoshida and Kawabe, 2013).The rapid evolution and expression alteration after gene duplication events found in the PRC2 genes in this study also suggest that gene duplication has very important roles in the evolution of genomic imprinting.
In this study, we propose that genomic imprinting can be established by gene duplication and by gene silencing and reactivation processes.It should be noted that these processes can only explain maternally expressed imprinted genes.It is possible to confirm the establishment of imprinted status by analyzing the patterns of both DNA methylation and expression among duplicates.

Fig. 2 .
Fig. 2. Phylogenetic trees of PRC2 components in Gramineae.Phylogenetic relationships in Gramineae species are shown as described in Fig. 1.Species names are abbreviated as Bdis: Brachypodium distachyon; Sita: Setaria italica; Sbic: Sorghum bicolor; Zmay: Zea mays; Hvul: Hordeum vulgare.Homologs in each species of the six genes shown in this figure are identified in Supplementary TableS2.

Fig. 3 .
Fig. 3. Expression patterns of PRC2 components in Brassicaceae and Gramineae.Gene expression of PRC2 components in root (R), stem (S), leaf (L), inflorescence (IF), endosperm (En), embryo (Eb), silique (Si), callus (C), whole seed at 5 days after pollination (DAP) (S5) and whole seed at 10 DAP (S10).For a given gene and developmental stage, the level of expression is shown as a percentage of the maximum expression among the compared tissues.Genes with high ω (ω > 0.5) are indicated in boldface and shade.Asterisks (*) beside the gene names indicate imprinted genes.The expression level values from original investigations are summarized in Supplementary TableS3.

Table 1 .
Number of copies of PRC2 genes in the genome of each species

Table 2
summarizes the d N , d S and ω values in total branches from each gene tree.The results are similar to those for the genes described above.RBAP3 is highly conserved.One each of the EMF2 and FIE loci had a relatively high ω value compared to the other locus.The loci with high ω values are those showing limited expression patterns in O. sativa.

Table 2 .
Summary of d N /d S values in the A genome of Oryza species Values are estimated as total branch length of the tree.