Spatial genetic and epigenetic structure of Thlaspi arvense (ﬁeld pennycress) in China

Thlaspi arvense (ﬁeld pennycress) is widespread in temperate regions of the northern hemisphere. We estimated the genetic and epigenetic structure of eight T. arvense populations (131 individuals) in China using ampliﬁed fragment length polymorphism and methylation-sensitive ampliﬁed polymorphism molecular-marker techniques. We detected low diversity at both genetic (mean = 0.03; total = 0.07) and epigenetic (mean = 0.04; total = 0.07) levels, while signiﬁcant genetic ( F ST = 0.42, P < 0.001) and epigenetic ( F ST = 0.32, P < 0.001) divergence was found across the distribution range. Using Mantel testing, we found spatial genetic and epigenetic differentiation, consistent with isolation-by-distance models. We also identiﬁed a strong correlation between genetic and epigenetic differentiation (r = 0.7438, P < 0.001), suggesting genetic control of the epigenetic variation. Our results indicate that mating system, natural selection and gene ﬂow events jointly structure spatial patterns of genetic and epigenetic variation. Moreover, epigenetic variation may serve as a basis of natural selection and ecological evolution to enable species to adapt to heterogeneous habitats. Our study provides novel clues for the adaptation of T. arvense .


INTRODUCTION
Thlaspi arvense L. (field pennycress), an annual diploid (2n = 2x = 14) herbal species of the tribe Thlaspideae (Brassicaceae), is widely distributed in temperate regions of the northern hemisphere. In China, it occupies a wide range of elevations, from sea level to approximately 4,000 m above sea level (Warwick et al., 2002;An et al., 2015). It can adapt to highly heterogeneous environments, and is thus a promising model species for exploring the mechanisms that underpin adaptability to diverse environments. Similar to its relatives in the Brassica-ceae, mutations in Flowering Locus C (FLC) resulted in two ecotypes: the spring-flowering habit (early flowering) is free from vernalization in the transition from vegetative to reproductive growth, whereas vernalization is necessary to initiate flowering for the winter-flowering habit (late flowering) (McIntyre and Best, 1978;Dorn et al., 2018). In recent years, the plant has attracted great attention, since T. arvense possesses excellent agronomic traits, and comparative genomics has revealed substantial synteny with well-studied plants. Furthermore, transgenic protocols for Arabidopsis thaliana have also worked well for T. arvense (Phippen and Phippen, 2012;Sedbrook et al., 2014;McGinn et al., 2019).
The draft genome of T. arvense was published in 2015 (Dorn et al., 2015); however, little has been reported on the distribution of genetic variation or the genetic structure of natural populations of T. arvense. Genetic varia-tion is common in nature; it serves as the substrate for natural selection, and has been an important indicator in assessing the persistence of species and the capability to adapt to changing environments (Fisher, 1930;Frankham et al., 2002). Genetic variation exists not only within populations, but also among populations. On one hand, T. arvense occupies wide distribution ranges, geographic populations are scattered in highly heterogeneous environments, and divergent selective pressures are likely to drive the adaptive evolution and differentiation between local populations. On the other hand, geographic distance, as well as geographic barriers, have obstructed the genetic connectivity between populations. More importantly, divergent flowering times may be another factor that contributes to population differentiation.
Genetic variation clearly underpins adaptation to ecological environments Yu et al., 2016;Ma et al., 2019;Zhang et al., 2019). Recent studies have also revealed divergent phenotypes in genetically uniform plant populations, as the consequences of epigenetic variation between individuals (Wilschut et al., 2016). Epigenetics refers to any process that has the potential to regulate gene expression and thus affect phenotype, but without any changes in DNA sequence; these include DNA methylation, small RNA expression and chromatin modifications (Richards, 2011;Deichmann, 2016). The most studied and heritable epigenetic modification is DNA methylation, where the 5-position of cytosine is modified by a methyl group in three sequence contexts: CG (or CpG), CHG and CHH (where H represents A, T or C) (Jones and Takai, 2001;Fulneček et al., 2002). In general, methylation of promoter and enhancer regions represses gene expression, whereas methylation of the gene body results in increased gene expression (Klose and Bird, 2006;Jones, 2012). The origin of epigenetic variation is mainly due to genetic variation, as a consequence of environmentally induced or inherited epimutation (Richards et al., 2017). Independent of genetic variation, abundant DNA methylation variation contributes to phenotypic differentiation and ecological adaptation in animals and plants. For instance, differences in DNA methylation affect personality traits in the great tit (Parus major) (Verhulst et al., 2016). They also contribute to heritable variation in flowering time and adaptive divergence in the apomictic dandelion (Taraxacum officinale); this variation can be synchronized using a methyltransferase inhibitor (Wilschut et al., 2016). The DNA methylation pattern of alligator weed, inherited stably across generations in the field, was immediately altered following common garden transplantation (Shi et al., 2018). Variation in DNA methylation is readily influenced by the environment, and an accurate DNA methylation state is crucial to some biological and developmental processes ). An understanding of DNA methylation variation may provide new insights into ecological adaptation and evolution. As such, one may predict that natural populations under different selection regimes would differ in epigenetic states as a result of directional selection and evolutionary adaptation.
Knowing how genetic and epigenetic diversity are distributed over a large landscape scale is essential for understanding adaptive evolution, and is of great importance for germplasm utilization for T. arvense. Here, we applied amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP) methods in T. arvense to address the following questions: (i) Are there genetic and epigenetic differentiation between populations? (ii) What is the relationship between genetic and epigenetic differentiation?
Since DNA methylation is differentiated between tissues and developmental stages, only the third and fourth true leaves were sampled when the plants started to flower, to ensure the samples were from the same developmental stage. Pairs of sampling sites were separated by a linear distance of more than 85 km. To avoid repeated sampling of individuals derived from the same mother plant or its relatives, we only sampled plants more than 30 m apart. Finally, a total of 131 samples were collected from the eight geographic populations through multiple sampling in 2017 (Table 1).
AFLP and MSAP protocol Total genomic DNA was extracted from dried leaves using a Qiagen DNeasy Plant Mini kit according to the manufacturer's protocol. The extracted DNA was tested by 1% agarose gel electrophoresis and then stored at − 80 °C for future use.
Population genetic diversity and structure were analyzed by AFLP according to the protocol of Vos et al. with some adjustments (Vos et al., 1995;Gao et al., 2010). We screened 10 EcoRI/MseI primer pairs to selectively amplify the following fragments in all samples: AA/CCC, AC/CAG, AC/CCT, AC/CGA, AC/CCC, AG/CGA, AG/CGC, AT/CCC, AT/CCG and AT/CGA (Supplementary Table S1). The MSAP method was applied to assess epigenetic variation within and between populations. In the AFLP-derived MSAP technique, first developed by Reyna-López et al. (1997), MseI used in AFLP was replaced by a pair of isoschizomers, HpaII and MspI, that have the same cleavage site at 5′-CCGG sequences but differ in their sensitivities to the methylation state of cytosine. The following 10 selective amplification primer pairs were developed to detect methylation variation at CCGG sites: AA/TTG, AA/ TAC, AC/ATC, AC/ACT, AC/TCG, AC/TCT, AC/ACG, AG/ TGC, AG/TAG and AT/TCA. The detailed protocol can be found in Gao et al. (2010). After denaturation at 95 °C for 5 min, the final selective products were separated by 6% denaturing polyacrylamide gel electrophoresis. The gels were then stained in 0.15% AgNO 3 for 10 min and developed in 1500 ml of 2% NaOH containing 2 ml of formalin.
Genetic and epigenetic analysis The raw data were represented by a binary matrix, with band presence and absence coded as "1" and "0", respectively. The MSAP matrix was also transformed into unmethylated marker (MISP) and methylated marker matrices in which methylation types were characterized using the "Mixed Scoring 1" classification system described by Schulz et al. (2013): for MISP markers, only condition I (fragments present in both profiles) was scored as "1"; for methylated markers, both conditions II (fragments present only in EcoRI/MspI profiles) and III (fragments present only in EcoRI/HpaII profiles) were scored as "1". All subsequent analyses were based on the two AFLP and methylated marker data binary matrices. The following genetic and epigenetic diversity parameters were all estimated using GenAlEx 6.503 (Peakall and Smouse, 2012): number of different alleles (Na), number of effective alleles (Ne), Shannon's information index (I), expected heterozygosity (H E ), number of private bands (NPB), pairwise population differentiation values (PhiPT), percentage of polymorphic bands (PPB) and Nei's genetic and epigenetic distance. Analysis of molecular variance (AMOVA) based on 1,000 permutations was conducted in Arlequin 3.1 (Excoffier et al., 2005) to calculate the partitioning of variation within and among the eight geographic populations of T. arvense. In addition, principal coordinate analysis (PCoA) using Nei's genetic and epigenetic distance, which is based on the two kinds of markers, was performed to visualize relationships among the populations. The resulting distance matrices were also used to construct dendrograms based on the unweighted pair-group method with arithmetic averages in MEGA 4.1 followed by modification with the iTOL v4 online tool (https://itol.embl.de/).
To identify the number of subpopulations based on genetic and epigenetic backgrounds, Bayesian clustering was carried out in STRUCTURE 2.3.4 (Hubisz et al., 2009). As AFLP and MSAP markers are both dominant molecular markers, we defined "0" as the recessive allele at each locus (Falush et al., 2007). All Bayesian analyses were performed under the admixture model. To determine the optimal number of subpopulations (K), 10 runs were performed for each value of K from 2 to 8, with each run comprising a 10,000-iteration burn-in period followed by 10,000 iterations. The output results were then uploaded to Structure Harvester to seek the optimal K-value, which corresponded to the maximum ∆K value (Earl and vonHoldt, 2011).
To test whether the observed genetic and epigenetic differentiation associated with geographic distance accorded with classical isolation-by-distance (IBD) models, and whether the one-to-one genetic and epigenetic differentiation between the eight T. arvense populations have the same tendency, we also performed multiple Mantel tests based on 9,999 permutations in GenAlEx 6.503 (Peakall and Smouse, 2012).

RESULTS
Genetic distance and diversity In this study, 461 and 400 repeatable loci were respectively amplified in genetic and epigenetic analyses of the 131 individuals from eight geographic populations. High genetic and epigenetic identities were found between the eight populations (Table 2), with similar patterns observed in both types of analyses. In the genetic analysis, the highest identity was found between Pop1 and Pop3 (Nei's genetic identity = 0.993), and the lowest was between Pop4 and Pop6 (0.950). In the epigenetic analysis, the highest identity (0.991) was also between Pop1 and Pop3, while the lowest was between Pop5 and Pop6 (0.956). Cluster analyses of the genetic and epigenetic data grouped the eight populations into three clades (Fig. 2), one containing all high-elevation populations, one comprising Pop5 alone and a third consisting of all low-elevation populations plus middle-elevation Pop4.
In the genetic diversity analysis, 461 loci were clearly amplified, among which 4.12-19.96% were polymorphic among the eight geographic populations (Table 3). An average of 11.38% and 36.88% were polymorphic at population and species levels, respectively. NPB in the eight populations varied from 0 to 11 (mean = 5), and Na in each population ranged from 0.97 (Pop5) to 0.72 (Pop6), with an average of 0.85, and was 1.25 at the species In the epigenetic diversity analysis (Table 3), 400 distinct loci were amplified. PPB among the eight populations varied from 8.00% (Pop6) to 21.50% (Pop5), with an average of 13.78%, and 38.25% of bands were polymorphic at the species level. The average NPB among the eight populations was 5. Pop5 still had the highest NPB (12), while no private band was detected in Pop6. Furthermore, the maximum values of most other diversity indexes were detected in Pop5, whereas the minimum values were detected in Pop6.
According to PCoA plots (Fig. 3), low-and highelevation populations tended to cluster. In analyses of the genetic data (Fig. 3A), middle-elevation populations Pop4 and Pop5 did not cluster together. In contrast, PCoA of the epigenetic data grouped Pop4 with lowelevation populations (Fig. 3B). In addition, Pop2 and Pop6 were slightly less associated with other populations   level. Similar to the Na, the highest genetic diversity index value in the eight populations was in Pop5 (Ne = 1.12; I = 0.10; H E = 0.07); the lowest was in Pop6 (Ne = at the same elevation compared with their genetic clustering patterns. These groupings were consistent with the patterns revealed by population pairwise genetic and epigenetic identities. In STRUCTURE analyses of the genetic and epigenetic data, the maximum ∆K values were obtained at K = 5. The 131 individuals from the eight geographic populations were therefore assigned to five genetic (Fig.  4A) or epigenetic (Fig. 4B) ancestral Hardy-Weinberg equilibrium populations based on their potential genetic or epigenetic backgrounds. In the genetic clustering analysis (Supplementary Table S2), the ancestry membership participation of Pop1, Pop3 and Pop4 in cluster V was > 44% (44.7-96.5%), while Pop6, Pop7 and Pop8 in cluster I had > 63% (64.0-81.8%) of ancestry membership coefficients, and Pop2 and Pop5 had higher ancestry membership coefficients in cluster II and III, with 78.5% and 55.3%, respectively. In the epigenetic clustering (Supplementary Table S2), Pop1, Pop2 and Pop4 had > 63% (64.0-96.7%) of ancestry membership participation in cluster V, whereas Pop6, Pop7 and Pop8 had higher ancestry membership coefficients > 75% (75.7-81.9%) in cluster IV, and Pop3 and Pop5 had ancestry membership contributions of 53.6% and 55.8% to clusters II and III, respectively.
Most of the variation in the two kinds of markers uncovered by AMOVA (Table 4) existed within geographic populations (58-68%). Nevertheless, obvious genetic and epigenetic differentiation was detected: approximately 42% of the variation was assigned to population genetic differ-entiation (P < 0.001), while approximately 32% could be ascribed to population epigenetic divergence (P < 0.001).

IBD tests and correlations between genetic and epigenetic differentiation
Mantel tests uncovered significant correlations between geographic distance (km) and population differentiation (Fig. 5A, 5B). The correlation between geographic distance and population genetic differentiation was stronger (r = 0.5362, P < 0.01) than that between geographic distance and epigenetic population  All the AMOVAs were based on 1,000 permutations. differentiation (r = 0.4512, P < 0.05). This result demonstrates that population genetic and epigenetic differentiation in the analyzed T. arvense populations increased with geographic distance and that genetic differentiation was more evident under the IBD model. Furthermore, the most significant correlation (r = 0.7437, P < 0.001) detected by Mantel testing was between genetic differentiation and epigenetic differentiation (Fig. 5C), which indicates that local populations share similar patterns of genetic and epigenetic divergence.

DISCUSSION
Low genetic and epigenetic diversity In general, genetic diversity within allogamous species is three times larger than that within autogamous species (Ayala and Kiger, 1980), and heterozygosities within outcrossing populations are significantly higher than those within self-fertilizing ones (Nybom, 2004). In our eight geographic populations, low expected heterozygosities were identified by genetic (mean = 0.03, total = 0.07) and epigenetic (mean = 0.04, total = 0.07) analyses. As defined by Goodwillie et al. (2005), an outcrossing species has an outcrossing rate greater than 80%, whereas outcrossing rates below 20% are characteristic of selfing species; consequently, T. arvense, which has a selfing rate above 80%, is a selfing species (Best and McIntyre, 1975). A previous study had revealed low genetic diversity within populations in a predominantly selfing plant, Tacca chantrieri (Zhang et al., 2006), and the high proportion of autogamy also contributed to a very low level of intrapopulation genetic diversity in T. chantrieri (Chen et al., 2008). These findings were similar to our observations in T. arvense (PPB = 36.88%, I = 0.12, Na = 1.25). Thus, the limited genetic diversity could be largely attributed to the high proportion of autogamy in T. arvense. Other evolutionary processes, such as shorter colonization history, could also be related to the relatively low genetic diversity. Among the eight populations of T. arvense, Pop5, collected from the middle elevation, had the highest genetic diversity. Plant populations in glacial refugia and admixture zones possess the greatest genetic diversity (Jadwiszczak et al., 2015); however, a previous phylogeographic study on T. arvense revealed that the richest alleles of the populations on the eastern edge of the Qinghai-Tibet Plateau were attributed to the admixture zone, rather than the glacial refugia, even though the populations were located in the universally recognized glacial refugia of Hengduan Mountain . It was likely that the location where Pop5 thrived was a transition zone of high-and low-elevation lineages, and the special climate conditions enabled them to coexist there; thus, the richest genetic diversity found within Pop5 might be explained by the hybrid zone effect. The epigenetic diversity of the eight populations was very similar to their genetic diversity. The expected heterozygosities in epigenetics ranged from 0.03 to 0.08, with 0.04 on average and 0.07 at the species level. Only 38.25% of the 400 amplified loci were polymorphic, with the majority (61.75%) found to be monomorphic within the eight populations. These results contrast with previous reports of a higher rate of DNA methylation variation relative to nucleotide variation, and the prediction that the methylome evolves faster than the genome because of its environmental sensitivity (Jablonka and Raz, 2009;Becker et al., 2011;Foust et al., 2016). On the one hand, in our MSAP analysis, the absence of bands in both HpaII and MspI profiles (0, 0) might result from full methylation of the external cytosine, full methylation of both cytosines, hemimethylation of both cytosines or genetic variation. To avoid interference from genetic variation, "Mixed Scoring 1" (Schulz et al., 2013) was performed to split the MSAP matrix into MISP and methylated marker matrices, and only the methylated marker matrix was used to assess the epigenetic diversity and epigenetic structure. Consequently, part of the epigenetic variation may have been neglected in our analysis, resulting in the pattern that the epigenetic diversity was not higher than genetic diversity. Previous quantitative research on DNA methylation variation revealed that DNA methylation was unevenly distributed across the genome, but highly conserved between individuals that spanned divergent habitats (Trucchi et al., 2016). This is similar to our finding that up to 61.75% of total amplified loci were monomorphic; thus, DNA methylation was conserved in T. arvense even over a wide distribution range.
Distinguishing population divergence Accord- ing to AMOVA, approximately 42% of genetic variation existed among populations. This result indicated considerable genetic differentiation among the eight populations (F ST > 0.25) (Freeland, 2005). The mating system, which has effects on genetic diversity, also plays a role in genetic differentiation. The level of genetic variation among populations of outcrossing plant species is approximately 10-20%, increasing to approximately 50% in selfing species (Hamrick and Godt, 1989). The genetic differentiation observed in T. arvense in our study was similar to that in Leavenworthia alabamica (F ST = 0.45) and L. crassa (F ST = 0.36), two cedar glade endemic species that have both self-incompatible and self-compatible populations (Koelling et al., 2011), and was comparable to the average differentiation value (F ST = 0.43) among self-fertilizing or clonal plants ( Morjan and Rieseberg, 2004). Furthermore, genetic differentiation is related to gene flow, which is influenced by a wide range of factors, such as geographic distance, elevation, topography, climate and life history (Turner and Trexler, 1998;Guarnizo and Cannatella, 2013;Castillo et al., 2014;Polato et al., 2017). The frequency of gene flow across geographic regions is inversely proportional to geographic distance. In our study, the direct line distance between the eight populations ranged from 88 to 2,357 km, and, except for the middle elevation, a lower differentiation level was found between populations within the same elevation group, with 0.197-0.384 between the low-elevation populations and 0.230-0.393 between the high-elevation populations (Supplementary Table S3). Our IBD test (r = 0.5362, P < 0.01) also revealed significant genetic differentiation according to geographic distance. The eight populations of T. arvense spanned a large geographic scale, and the mountains, rivers, elevation and climate differences had clear effects on gene flow among geographic regions. Furthermore, we found divergent flowering time between the three elevations: the low-elevation populations flowered in the spring and the high-elevation populations flowered in summer, whereas the middle-elevation populations flowered in winter, resulting in prezygotic reproductive isolation. Apart from the gene flow, differentiated natural selection also contributed to the genetic divergence between local populations (Savolainen et al., 2007). The eight populations of T. arvense inhabited highly heterogeneous environments, and thus will have experienced divergent selective stresses. Several studies have described the selection on genetic variation which underpins adaptation to extreme environments, such as hypoxia, low temperature and high-dose ultraviolet radiation (Zhang et al., 2014;Zhang et al., 2016;Hallmark et al., 2019). Here, the high-elevation populations (Pop6, Pop7 and Pop8) collected from the Qinghai-Tibet Plateau had higher genetic differentiation from the other five populations collected from middle and low elevation (Supplementary Table S3). The Qinghai-Tibet Plateau is well known for its extreme environmental conditions (low temperature, low oxygen and strong ultraviolet radiation); the three high-altitude populations of T. arvense should have experienced strong natural selection, and adaptive evolution may have taken place in related genes, and the limited gene flow between high elevation and middle and low elevations may therefore shape the higher differentiation level. In our analysis of cytosine methylation variation within T. arvense, we showed significant epigenetic differentiation (F ST = 0.32) among the eight natural populations from divergent habitats, and a significant correlation (r = 0.4512, P < 0.05) was also found between geographic distance and epigenetic differentiation. Variation in DNA methylation is subject to natural selection, and thus these fitness-related variations should accumulate in populations (Hirsch et al., 2012). A study of 1,001 worldwide epigenomes of A. thaliana showed that changes in DNA methylation were related to geographic origins and local adaptation (Kawakatsu et al., 2016). Thus, the epigenetic variation within the eight populations of T. arvense has been confronted with different elements for natural selection, resulting in the epigenetic differentiation between populations. Meanwhile, limited gene flow also contributed to the epigenetic differentiation and shaped the significant correlation between geographic distance and epigenetic differentiation within the natural populations.
In this study, we not only showed a significant population structure in genetics and epigenetics, but also found a strong correlation between genetic differentiation and epigenetic differentiation (r = 0.7483, P < 0.001), which is consistent with the hypothesis that epigenetic variation is mainly due to genetic variation (Richards et al., 2017). Massive methylome sequencing in Arabidopsis and soybean identified about 23% and 35% of differentially methylated regions that were associated with genetic polymorphisms (Schmitz et al., 2013;Shen et al., 2018). The strong correlation between genetics and epigenetics is likely attributable to genetic control of epigenetic variation.
In conclusion, we selected the widespread annual T. arvense as a new model system to explore the spatial genetic and epigenetic structure of natural plant populations. Our analyses revealed that this annual species has low genetic and epigenetic diversity even though it grows in highly heterogeneous habitats, but genetic and epigenetic differentiation was evident between our sampled populations. Our findings indicate that mating system, geographical distribution and habitat play important roles in shaping the genetic and epigenetic structure of natural plant populations. Moreover, we noted that the variation in the methylome is often associated with genetic polymorphism. Further research on comparative genomics and population genetics in the context of high-throughput sequencing should accelerate the development and utilization of T. arvense, and provide insights into adaptation and evolution for widespread plant species.