Linkage disequilibrium in a population undergoing periodic fragmentation and admixture

Glacial and interglacial cycles are considered to have caused the fragmentation and admixture of populations in many organisms. A simple model incorporating such periodic changes of the population structure is analysed in order to investigate the behaviour of neutral genetic variation at one and two loci. The equilibrium is reached very quickly in terms of cycles if the length of a cycle is long, as would be expected of the glaciation cycles. Heterozygosity and linkage disequilibrium are shown to depend on the length of time of the fragmented and admixed phases, population sizes, and number ( n ) of subpopulations in the fragmented phase. If the population size is small in the fragmented phase and its duration is long, the squared correlation coefficient of two loci (a measure of linkage disequilibrium) just after the admixture is approximated by 1/( n –1) for n > 1. After admixture, the correlation decays at a rate of approximately twice the recombination rate. Therefore, if post-glaciation admixture created linkage disequilibrium, we expect to observe linkage disequilibrium even between moderately linked loci, and its decay pattern along the chromosome is very different from that in a random mating population at equilibrium. This is especially true in organisms with long generation times such as trees.


INTRODUCTION
From about 0.7 million years ago (MYA), approximate 100-k year cycles of ice sheet expansion and recession have been continuous (Webb and Bartlein, 1992). For many species, these cycles (glacial-interglacial cycles) have caused contractions and expansions of their ranges and have resulted in changes in their population structures. More concretely, in one phase (the fragmented phase), the species might retreat to one or a few isolated refugia and their population sizes might become small. In the other phase (the panmictic phase), the subpopulations expand and admix to form a large panmictic population. As an example of approximate panmixis, currently many wind-pollinated plants show only weak population structure (Hamrick and Godt, 1996). Thus, the populations undergo fragmentation and admixture periodically.
In the fragmented phase, variation within subpopulations decreases and variation between subpopulations increases. Admixture merges the variation between sub-populations into variation in the whole population. In such cases, the equilibrium variation is determined by population sizes in the two phases and the time lengths. Jesus et al. (2006) quantified this situation and have shown that variation can be greater than that expected for a panmictic population of constant size even when the population size is small in the fragmented phase (the structured phase of Jesus et al., 2006). Moreover, because population admixture has been known to create linkage disequilibrium (Nei and Li, 1973), we expect to observe stronger linkage disequilibrium compared to that in a panmictic population, especially during the early portion of the panmictic phase.
In the present paper, an analysis of a simple model of genetic variation at one and two neutral loci under periodic admixture-fragmentation is presented. Specific questions asked are; 1. What would be the equilibrium diversity and linkage disequilibrium just after admixture? 2. How quickly is this equilibrium reached? 3. How does the dependency of linkage disequilibrium on the recombination rate under this model differ from that in a panmictic population? Edited by K. Ryo Takahashi * Corresponding author. E-mail: htachida@kyudai.jp

MODEL
We consider a monoecious diploid population with discrete non-overlapping generations and assume that the population's structure changes periodically. Each cycle consists of two phases in the following order: the panmictic and the fragmented phases ( Fig. 1). At the beginning of the panmictic phase, existing subpopulations in the fragmented phase are merged in equal proportions to form a panmictic population with size N 1 and this state continues for t 1 generations. At the beginning of the fragmented phase, the population is subdivided into n subpopulations each of which has size N 2 and this state continues for t 2 generations. In this phase, there is no migration between subpopulations and mating is random in each subpopulation. This cycle is repeated with the same parameters. This model is the same as the one studied by Jesus et al. (2006). We would like to know the diversity measurements of the population at one and two loci (defined below) after k cycles of the fragmentation admixture.
Consider two neutral loci A and B and the recombination rate between them is r. For mutations, the infinite allele model (Kimura and Crow, 1964) is assumed, and u (> 0) is the mutation rate per generation at each locus.
We compute the average heterozygosity at one locus (one-locus diversity measure) and joint heterozygosity (two-locus diversity measure) at two loci under the model by deriving their transition equations. Equivalent results can also be obtained by a genealogical approach as in Jesus et al. (2006) and McVean (2002) or using the diffusion approximation of the Wright-Fisher model as in Ohta and Kimura (1969) and Mano (2007).

ANALYSES
In the following, one-and two-locus diversity measures at the beginning of the kth cycle are calculated. Once they are obtained, it is easy to compute those measures at any time in the cycle using the measures at the beginning of the kth cycle as initial values.

One-locus diversity measures
Here, we consider one of the two loci. Let θ k be the probability that two alleles randomly taken from the population at the beginning of the kth cycle are different. At this time point, the panmictic phase has just started. Let p i be the frequency of the ith allele A i . Then, θ k is expressed as: where E[ ] denotes the expectation over the joint distribution of p i 's. We will derive a transition equation that relates θ k+1 to θ k . Define the following quantities: where ν = (1-u) 2 . ω i is the probability that there is neither coalescence nor mutation of two alleles in one generation and is the equilibrium value of the probability that two alleles randomly taken from a random mating population with a constant population size N i are different.
First, we consider the panmictic phase and let θ pt be the probability of sampling two different alleles from the population at the tth generation from the start of the panmictic phase. Then, (1) (see, for example, Cockerham, 1984).
Next, we consider the fragmented phase and let θ st and θ dt be the probabilities of sampling two different alleles from the same subpopulation and different subpopulations, respectively, at the tth generation from the start of the fragmented phase of the kth cycle. Because two alleles taken at the beginning of the fragmented phase are samples from the panmictic population at the end of the panmictic phase irrespective of whether they are taken from the same or different subpopulations, we obtain: (2) Because the size of each subpopulation is N 2 , (4) at the end of the fragmented phase.
Combining (2)-(4) and noting that the subpopulations contribute equally to the panmictic population at the beginning of the k+1th cycle, we obtain a transition equation: where and it represents the equilibrium value of θ k .
Note that Λ is less than or equal to the larger of and and appreciably smaller than one unless t 1 + t 2 is small. Therefore, the approach to the equilibrium is fast and the equilibrium value is of practical interest.
Numerical examples of the equilibrium value, θ ∞ , as a function of t 2 /t 1 for various values of n when t 1 /(2N 1 ) = 0.1, N 2 /N 1 = 0.01 and 4N 1 u = 0.04 are shown in Fig. 2. Here, we assumed N 1 , N 2 >> 1 and u << 1, so that θ ∞ can be approximately expressed as a function of scaled times, t 1 / (2N 1 ) and t 2 /(2N 2 ), and the scaled mutation rate 4N 1 u. The case with n = 1 is just the periodic size change circumstance, and thus θ ∞ decreases as t 2 increases. If n > 1, θ ∞ first decreases as t 2 increases, but then it increases and becomes larger than that expected in a randomly mating population with a larger size N 1 . This is because variation within subpopulations is lost but variation between subpopulations accumulates as time passes in the fragmented phase. The accumulation of variation between subpopulations is faster when n is larger. When n = ∞, the initial decrease does not occur because two alleles taken at the beginning of the cycle never come from the same subpopulation in the fragmented phase. Jesus et al. (2006) have also shown these using a genea-logical approach.
Two-locus diversity measures Next we consider two loci A and B. Let the alleles of the two loci be designated by A i and B j , respectively. Define g ij , p i , and q j as frequencies of a gamete A i B j , allele A i , and allele B j , respectively.
In order to define the diversity measures for two loci, we introduce additional information. Assume that we sample two alleles a, a′ at the A locus and b, b′ at the B locus from the population. There are several ways in which sampled alleles are located on the chromosomes in the population (Weir and Cockerham, 1969). If alleles are on the same chromosome, we concatenate the two alleles. For example, if a and b are on the same chromosome, we write them as ab. If they are on different chromosomes from the same (sub)population, we separate the two alleles by a comma, such as a,b. Finally, alleles in the same (sub)population are put within parentheses. Thus, two alleles (a,b) are from the same (sub)population and (a)(b) are from two different (sub)populations. We indicate that two alleles x, y are different by . With these notations, we define three two-loci diversity measures at the beginning of the kth cycle as (see Weir and Cockerham, 1969): and a column vector x k = (Θ k , Γ k , Δ k , θ k ,) T where the superscript T denotes a transposition. The three measures, Θ k , Γ k , Δ k , are called digametic, trigametic and quadrigametic measures because the alleles are located on two, three and four gametes, respectively.
In order to compute the transition from x k to x k+1 , we first consider the panmictic phase. Let Θ t , Γ t , and Δ t be the diversity measures at the tth generation from the start of the kth cycle. Here, we derive the transition equation for the digametic measure, which relates Θ t+1 to Θ t . Those for Γ t , and Δ t can be derived similarly. We consider three cases, (a) no mutation occurs at either locus (with a probability ν 2 ), (b) mutation occurs at one of the loci and it does not occurs at the other loci (with a probability 2ν (1-ν)) and (c) mutation occurs at both loci (with a probability (1-ν) 2 ).
In case (a), if neither of the two gametes is a recombinant (with a probability (1-r) 2 ), both two allele pairs, (a, a′) and (b, b′), are different when the two gametes come from different chromosomes (with a probability 1-1/(2N 1 )) and the digametic two allele pairs in the previous generation are different (with a probability Θ t ). If one of the two gametes is a recombinant but the other is not (with a probability 2r(1-r)), the two allele pairs are different when they come from different individuals (with a probability 1-1/N) and the trigametic two allele pairs in the Fig. 2. One-locus equilibrium diversity measure, θ ∞ , as a function of t 2 /t 1 for various values of n are shown. Other parameters are u = 10 -7 , N 1 = 10 5 , t 1 = 2 × 10 4 , N 2 = 10 3 .
Note that if both gametes come from the same individual in the previous generation, either of the two allele pairs comes from the same allele in the previous generation. If both gametes are recombinants (with a probability r 2 ), the two allele pairs are different either if they come from different individuals (with a probability 1-1/N) and the quadrigametic two allele pairs in the previous generation are different (with a probability Δ t ), or if both gametes come from the same individual (with a probability 1/N) but the alleles come from different alleles in this individual (with a probability 1/2) and the digametic two allele pairs in the previous generation are different (with a probability Θ t ).
In case (b), because one of the two allele pairs are different, only the other allele pair need to be different for the two allele pairs to be different. The other allele pair are different if they come from different alleles in the previous generation (with a probability 1-1/(2N)) and they are different (with a probability θ pt ). Note that recombination does not affect this case because whether the allele pair at the mutated locus come from the same allele in the previous generation or not does not affect the non-identity of the allele pair. Finally, in case (c), both allele pairs are different independent of how they come from the previous generations. Combining all those with some rearrangements, we obtain the transition equation for Θ t as, where a = 1/(2N 1 ). If we set y t = (Θ t , Γ t , Δ t ) T , it satisfies the following transition equation (Serant, 1974): In this equation, 1 is a column vector with all elements as one and A is a 3 by 3 matrix, With this initial condition, (8) can be solved as (9) where I is a 3 by 3 identity matrix. If we put t = t 1 in (9), we obtain the two-locus diversity measures at the end of the panmictic phase. For later use, we add a one-locus measure to the vector and redefine it as y t = (Θ t , Γ t , Δ t , θ pt ) T . Noting that the initial condition for y t is x k , y t at the end of the panmictic phase is expressed as a linear function of x k , where elements of P 1 and q 1 are obtained from (9) and (1).
In order to calculate the transition equation in the fragmented phase, we need to introduce more two-locus diversity measures because there are several ways in which sampled alleles are located on chromosomes (Tachida and Cockerham, 1986). For simplicity, we assume that the initial condition is symmetric with regard to the two loci. Even with this assumption, we need 13 measures, two digametic, three trigametic and seven quadrigametic measures that are listed in Table 1 along with their allele arrangements.
Let t be the time from the beginning of this phase measured in generations. The initial condition for the measures are Θ t1 , Γ t1 , and Δ t1 for digametic, trigametic and quadrigametic measures, respectively. We want to compute values of those measures z t = (Θ 1t , Θ 2t , Γ 1t , ..., Δ 7t , θ st , θ dt ) T at the end of the fragmented phase (t = t 2 ). Although the computation is straightforward, the expressions become complex. Therefore, the derivations are explained in Appendix A. As in the panmictic phase, z t is expressed as a linear function of the initial value (11) where P 2 and q 2 can be computed from (A1)-(A8) in Appendix A.
Because n subpopulations contribute equally to the panmictic populations of the next cycle, we obtain the three two-locus diversity measures at the beginning of the k+1th cycle using elements of as, We can express these along with (5) in a vector form: a r ar a 2 2 1 1 2 2 1 1 2 1 2 1 2 1 2 1  3 + + − Δ Δ ) ( ) ))), Periodical change and linkage disequilibrium (12) where H = (h i,j ) is a 4 by 15 matrix, whose non-zero elements are, All other elements are zero. Finally, we combine (10), (11) and (12) to obtain the transition equation from k to k+1 as: (13) The equilibrium values x k is obtained as: Linkage disequilibrium Here, we choose ρ 2 (r 2 of Hill and Robertson, 1968) as a measure of linkage disequilibrium and compute its approximate expectation using the diversity measures obtained in the preceding sections.
The expectation of ρ 2 is expressed approximately as (Hill, 1975): The right-hand side of the equation is of Ohta and Kimura (1969). For the accuracy of the approximation, consult Mano (2007). The denominator and numerator can be expressed using the two-locus diversity measures as: (see Weir and Hill, 1980). Therefore, E[ρ 2 ] is approximately expressed as: We can compute E[ρ 2 ] at the beginning of the panmictic phase by putting the equilibrium solution (14) or transient values computed from (13) into (16).
Before showing numerical examples using (16), we consider an extreme case where the recombination rate is large and the subpopulations in the fragmented phase are very small. More formally stated, we assume Nr >> 1 and t 2 /(2N 2 ) >> 1. Then, at the end of the fragmented phase, each subpopulation becomes monomorphic. Therefore, all diversity measures involving two alleles at the same locus in the same subpopulation become zero. That is: (17) (see Table 1). Here, we omitted subscript t 2 for simplicity. All other measures are approximately expressed as (θ d ) 2 because we assumed Nr >> 1. Putting those into (12) and using (16), we obtain (18) This equation shows that, even for loci not tightly linked, we expect to observe linkage disequilibrium at the beginning of the panmictic phase and that its extent will be larger when the number of subpopulations is small. Now let us examine linkage disequilibrium numerically using (13), (14) and (16). First, we ask how quickly the equilibrium state is reached. This can be answered by computing the eigenvalues of the matrix, HP 2 P 1 . Let λ i (i = 1, ..., 4) be the eigenvalues of the matrix defined in decreasing order. Numerical values of λ 1 (the largest) and λ 2 (the second largest) are shown in Table 2. Numerical computations show that λ 1 is the eigenvalue of θ k , where ω i = (1u) 2 (1 -1/(2N i )) and ν = (1u) 2 . Because , the equilibrium is obtained fairly quickly unless 2u(t 1 + t 2 ) is very small. We can also see that λ 1 monotonically increases to as n becomes larger. For large n, if we increase t 2 keeping t 1 + t 2 constant, λ 1 increases because ω 1 < ν. The second largest eigenvalue is also of interest because the behaviour of the two-locus measures also depends on them. As expected, λ 2 increases monotonically as r decreases and it achieves a value close to that of the largest eigenvalue when r is comparable to u. To see more directly how quickly E[ρ 2 ] approaches equilibrium at the beginning of the panmictic phase, its change was computed along with the one-locus diversity measure θ k . A typical example is shown in Fig. 3. We assumed that the population has been panmictic with size N 1 and in the equilibrium state before the start of the first cycle. In this example, λ 1 = 0.5927 and λ 2 = 0.4122. It takes several cycles to achieve the equilibrium state and E[ρ 2 ] approaches the equilibrium more quickly than θ k .
Next we examine the equilibrium state. Figure 4 shows E[ρ 2 ] at the beginning of the panmictic phase as a function of the recombination rate for various values of n. As shown in the figure, E[ρ 2 ] decreases as r increases and  ) .

x I HPP H Pq q
approaches 1/(n-1) (see (18)) except in the case of n = 2, because t 2 /(2N 2 ) = 8 used in the figure is very large. Therefore, strong linkage disequilibria are expected at the beginning of the panmictic phase when t 2 /(2N 2 ) is large. As n increases, E[ρ 2 ] decreases. When the length, t 2 , of the fragmented phase becomes shorter, the level of linkage disequilibrium decreases as shown in Fig. 5. Note that admixture results in linkage disequilibrium even without fixation within subpopulations, although its absolute value is not large (see the case of t 2 = 2N 2 ). Finally, we consider E[ρ 2 ] at the tth generation since the start of the panmictic phase. Because the approach to equilibrium in terms of cycles of the fragmentationadmixture is rapid, only the equilibrium state will be considered. We can compute E[ρ 2 ] using the equilibrium value, x ∞ , as its initial condition and putting it into (9). When 2Nr >> 1, the time course of E[ρ 2 ] can be approximately computed by ignoring the terms containing 1/(2N 1 ) in (9). In this case, the two-locus diversity measures are approximately expressed as: Therefore, we obtain: Table 2. The largest and second largest eigenvalues of HP 2 P 1 n = 2 n = 5 n = 10 Other parameters are N 1 = 10 5 , t 1 = 10 5 -t 2 , u = 10 -6 . λ 1 does not depend on r. Fig. 3. Changes of θ k and E[ρ 2 ] at the beginning of the panmictic phase. Parameters are u = 10 -6 , r = 10 -5 , N 1 = 10 5 , N 2 = 10 3 , t 1 = 2 × 10 4 , t 2 = 8 × 10 4 . Until the beginning of cycle 1, the population has been panmictic with size N 1 and in the equilibrium state. Fig. 4. Equilibrium values of linkage disequilibrium, E[ρ 2 ], and the number of subpopulations. E[ρ 2 ] is shown as a function of the recombination rate r. Four cases (n = 2, 3, 5, 10) are shown. Other parameters are u = 10 -7 , N 1 = 10 5 , N 2 = 5 × 10 3 , t 1 = 2 × 10 4 , t 2 = 8 × 10 4 . This relationship has been derived previously by Littler (1973) and shows that E[ρ 2 ] is reduced by a factor (1-r) 2 every generation when 2Nr is large. One example of the time course of the decay of E[ρ 2 ] is shown in Fig. 6. The straight and broken lines were computed from (9) and (20), respectively. The latter approximates the former quite well for large r and small t. We can see that linkage disequilibrium quickly decays when r is large, but it persists for a long time otherwise. The half-life of E[ρ 2 ] is approximately -log2/(2log(1-r)) ≈ 0.347/r. Thus, for example, it would take about 350 generations for E[ρ 2 ] of two loci 0.1 cM apart to decay to one-half of the values at the beginning of the panmictic phase. As t becomes large, E[ρ 2 ] approaches the equilibrium value for a randomly mating population of size N 1 , as approximately given by Ohta and Kimura (1971) when Nu is much smaller than 1, where R = N 1 r. This value is quite small unless r is comparable to u (see Fig. 6).

DISCUSSION
In this paper, the diversity measures at one and two loci, and a measure of linkage disequilibrium were investigated in a population undergoing periodic fragmentation and admixture. The study was motivated by the periodic nature of glaciation-interglaciation geological conditions. Under these circumstances, equilibrium is quickly approached and the size, N 2 , of subpopulations and time length, t 2 , of the fragmented phase, along with the number of subpopulations, strongly influence the levels of diversity measures and linkage disequilibrium. The parameters, N 2 , n, and t 2 , together determine the reduction of variation within subpopulations during the fragmented phase. After only one cycle of fragmentation and admixture, patterns of variation very different from those expected in panmictic populations are observed.
Such large linkage disequilibrium is also expected under models with geographically structured populations (Ohta, 1982;Tachida and Cockerham, 1986;De and Durrett, 2007). However, large estimates of F ST (if subpopulations are recognized) or F IS (if subpopulations are not recognized) are expected in geographically structured populations. On the other hand, estimates of those Fstatistics are very small under the present model after admixture. Therefore, it is not difficult to discriminate the two types of models from the data. Incidentally, large estimates of linkage disequilibrium may lead to spurious detection of positive selection using the decay of the extended haplotype homozygosity as pointed out by De and Durrett (2007). Therefore, it is important to obtain information of the past population structure to infer actions of selection.
Because we made some simplifying assumptions in drawing those conclusions, we will briefly consider some of their effects here. First, we assumed that transitions between the phases are instantaneous in the model. In reality, the expansion after glaciation is not instantaneous, although it is not slow (Hewitt, 2000). For example, in the case of the Japanese cedar, Cryptomeria japonica, its expansion started from a few refugia about 15000 years ago and its maximum abundance and range was reached around 7000 years ago (Tsukada, 1982). Therefore, N 2 is not constant through time. To address this violation, we might use effective size such as the har- Fig. 5. Equilibrium values of linkage disequilibrium, E[ρ 2 ], and the length of the fragmented phase. E[ρ 2 ] is shown as a function of the recombination rate r. Four cases (t 2 = 9N 2 , 8N 2 , 5N 2 , 2N 2 ) keeping the total length of one cycle constant are shown. Other parameters are u = 10 -7 , n = 2, N 1 = 10 5 , N 2 = 10 4 , t 1 = 10 5 -t 2 .  (20), respectively. Approximate equilibrium values (Eq.) for N 1 = 10 5 calculated from (21) are also shown. Parameters are u = 10 -7 , N 1 = 10 5 , N 2 = 5 × 10 4 , t 1 = 2 × 10 4 , t 2 = 8 × 10 4 .
monic mean for N 2 because an important factor is the extent to which variation is reduced within subpopulations in the fragmented phase. For the change of linkage disequilibrium, formulas in Mano (2007) may be useful. Second, although all parameters were assumed to be constant throughout the cycles in our model, some parameters might change between cycles. Indeed, glacialinterglacial cycles have been only quasi-periodic and some long-term trends might be imposed on them (see Webb and Bartlein, 1992). In addition, effective population sizes and number of subpopulations in the fragmented phase could fluctuate. Among those parameters, the time lengths are determined mainly by orbital-forcing of climatic oscillations and thus an approximate periodicity seems to be the case. Random factors, however, are likely to contribute to the resulting population sizes and the number of subpopulations as they are biological attributes. Therefore, their values might change randomly among the cycles of fragmentation and admixture. Especially important in determining the magnitude of linkage disequilibrium is the number, n, of subpopulations. Palynological studies (studies based on pollen and spores) have shown that some species were missing in some refugia in certain glacial periods (Bennett, 1997). Although a consideration of parameter fluctuation is important to understand the genetic diversity of organisms affected by repeated fragmentation-admixture, our results suggest that the most recent cycle is of primary importance because equilibrium is reached fairly quickly. Hence, the prediction of strong linkage disequilibrium from our model does not change qualitatively unless n in the most recent fragmentation phase is one.
Finally, although we assumed that n subpopulations in the fragmented phase equally contribute to the panmictic populations when admixture occurs, this is unlikely to be true in nature. Here, we consider the effects of an unequal contribution in the extreme case of Nr >> 1, and t 2 /(2N 2 ) >> 1. For n = 2, E[ρ 2 ] is still close to one because it is a correlation coefficient. To consider the case with n > 2, let x i be the contribution from the ith subpopulation to the panmictic population. Then, instead of (18), we obtain: where and Therefore, we might define an effective subpopulation number, n e , by equating the right-hand side of this equation to 1/(n e -1), as Because E[ρ 2 ] is a decreasing function of n e , any unequal contribution that reduces n e will increase E[ρ 2 ].
Many of the extant abundant taxa are considered to have experienced fragmentation during the glaciation period (Webb and Bartlein, 1992). Therefore, admixture of their fragmented subpopulations is considered to have occurred between approximately 5000 and 15000 years ago. Because the half-life of the linkage disequilibrium created by admixture is approximately 0.347/r generations, we expect to observe more than half the level of linkage disequilibrium at the beginning of the interglaciation between two markers less than 34.7/t cM apart, where t is the number of generations since the admixture. Therefore, extended linkage disequilibrium will be found in organisms having longer generation times. Recently, fine linkage maps have been developed in many organisms and total genome sequences are available in some model organisms. Thus, we can choose markers suitable for observing linkage disequilibrium as traces of past fragmentation. For example, if we assume that the generation time of a tree is 50 years, only one to three hundred generations have passed since the admixture after glaciation occurred. Then, we should be able to find linkage disequilibria between markers 0.35-0.11 cM or less apart if it existed when the admixture occurred. Because such markers are now available for trees (e.g. Brown et al., 2001;Tani et al., 2003;Tuskan et al., 2006), it would be interesting to survey those markers in natural populations to investigate their past population structure (see also Lascoux et al., 2008).