Edited by Hidenori Tachida. Hiroaki Iwaisaki: Corresponding author. E-mail: iwsk@agr.niigata-u.ac.jp |
|
In recent years, a variety of statistical methods to detect and locate quantitative trait loci (QTLs) using genetic marker information have been developed (e.g., Hoeschele et al., 1997). Of these methods, interval mapping using a pair of flanking markers (Lander and Botstein, 1989) provides that the effect and the location of a QTL could be estimated separately, while these estimates could be confounded with each other in the procedure using only a single marker. Interval mapping is likely to be relatively powerful for QTL mapping, when compared to the single-marker approach, and then it has widely been used in animal and plant populations.
With this method, it is allowed that the marker interval contains one QTL, and usually the remainder of QTLs in the same linkage group is not taken into account. Therefore, if there are several QTLs with moderate to large effects in the chromosome, then use of the conventional interval mapping procedure could lead to reduced power for QTL detection, biased estimates of the relevant parameters and detection of a ghost QTL (Knott and Haley, 1992; Martinez and Curnow, 1992; Wright and Kong, 1997). Jansen (1993) and Zeng (1994) addressed these problems by proposing an improved method, or the so-called composite interval mapping, that is able to absorb the effects of QTLs located outside the marker interval by including additional markers as cofactors in the model and to increase the mapping resolution by reducing the influence of co-segregating QTLs. However, when tightly linked QTLs or a cluster of QTLs exist in the same interval, wrong inference would possibly be made by (composite) interval mapping procedures, since again these usually require the assumption that one QTL is located in the marker interval.
The objective of this study was to investigate properties of some interval mapping procedures under the true situations where a single QTL or the plural number of QTLs exist within one marker interval by using the deterministic sampling. The approach of deterministic sampling was previously applied to examine the nature of the parameter estimates and the convergence of the estimate of the QTL location to an incorrect position on the considered chromosome under model misspecification (Mackinnon and Weller, 1995; Mackinnon et al., 1996; Wright and Kong, 1997). In this study, in addition to the conventional interval mapping, we considered the regression interval mapping (Kao, 2000) and the ‘QTL cluster mapping’ using the chromosome segment model (Matsuda and Iwaisaki, 2001, 2002) that is the mapping using the information on the conditional identity-by-descent (IBD) proportion for the marked chromosome region. The true genetic models for the quantitative trait were described here by certain parameters, in which the number of QTLs contained in the marker interval, the linkage phase, and the effect due to QTL(s) were varied.
In this study, a design for QTL mapping based on a single sire family with n half-sib offspring in an outbred population, or the half-sib design resulting from random mating between a single sire and n unrelated dams, was used. It was supposed that the sire is heterozygous for two marker loci, 1 and 2, flanking QTL(s), where the map distance between the two loci is l (in Morgans) and the corresponding recombination rate is R = 0.5(1–e–2l) (Haldane, 1919). The marker-QTL linkage phase for the sire was assumed to be known with the ordered genotype be M1QM2/m1qm2, where alleles at QTL and two marker loci are (Q, q), (M1, m1) and (M2, m2). Then the half-sib offspring could receive one of the four types of marker haplotype, h1 = M1M2, h2 = m1m2, h3 = M1m2 and h4 = m1M2, from the sire and the probabilities of these events (P(hj); j = 1~4) are given as (1–R)/2, R/2, R/2 and (1–R)/2, respectively.
Three procedures for mapping QTL(s) were compared herein. One is the conventional interval mapping (denoted as IM), in which a model with a single putative QTL fitted in the marker interval is used. The IM model for a single half-sib family is described as
yi = μ + α gi + ei, (1)
where yi is the phenotypic value of offspring i, μ is the overall mean of the family, α is the substitution effect at the putative QTL of the sire with the effects of the two alleles Q and q being equal to 1/2α and –1/2α, respectively, gi is the indicator variable taking 1/2 if offspring i receives the Q allele and –1/2 if receives the q allele from the sire, and ei is the residual effect (N(0, σ2)). Then, denoting the position of the putative QTL in the marker interval by ˆd, and the resulting recombination rates between the QTL and the left and the right markers, 1 and 2, by ˆrL = 0.5(1–e–2ˆd) and ˆrR = 0.5(1–e–2(l–ˆd)), respectively, and representing the probabilities of transmission of the QTL alleles from the sire to the offspring, given marker haplotype, by P(Q|hj) = pj and P(q|hj) = 1–pj, we have the conditional probabilities:
for marker haplotypes hj ; j = 1~4, respectively.
The likelihood of the phenotypic value y conditional on marker haplotype hj under the IM model is then written as
where ˆθ denotes the vector of the estimates of the unknown parameters μ, σ2 and α, excluding the estimate of the QTL position ˆd.
The QTL-cluster mapping (CM) was also considered, in which the transmission of the whole chromosome segment marked, rather than the single QTL, is taken into account. The CM model can be formulated as
yi = μ + DRi + ei (3)
with yi μ, and ei as defined in the IM model (1), where D represents the substitution effect for the marked segment, and Ri is given as 1/2[E(Ip|hj)–E(Im|hj)] with Ip and Im representing the proportions of IBD for the marked segment between the paternal and maternal gametes of the sire and the gamete for the offspring that originated from the sire, respectively. Then, applying the procedure of Guo (1994a, b, 1995) and using (1919) mapping function, the conditional expectations E(Ip|hj) and E(Im|hj) can be expressed as
with the transition probabilities at the distance t (Morgans) apart,
where k is the variable indicating the paternal and maternal gametes of the sire by 0 and 1, respectively, and s and e are the variables each taking 0 or 1 that represent the origin of the allele from the sire to the offspring at the flanking marker loci with t = 0 and t = l, respectively. For example, s = 0 and e = 0 indicate offspring’s haplotype M1M2.
For the current CM method, the likelihood of the phenotypic value y conditional on marker haplotype hj is given as
We here note that this CM likelihood is similar to the likelihood of the regression interval mapping (RIM; Kao, 2000), except for the assumption of the genetic model within the marker interval. Therefore, the CM method is equivalent to the RIM method, when the conditional probabilities pj and 1–pj are used instead of the conditional expectations E(Ip|hj) and E(Im|hj) in equation (6). In this study, the expected log-likelihood of the RIM model was also evaluated.
For the true genetic model, we considered the situations where one marker interval contains a single QTL or closely linked QTLs. For the latter case, five equally-spaced QTLs, each of which has the same effect, were considered, and three types of linkage phase for the sire were included: coupling, or (+ + + + + / – – – – –), repulsion, or (+ – + – + / – + – + –), and a specific phase in which the effects of the lefter two QTLs and the righter three QTLs were in opposite direction, or (– – + + + / + + – – –). Average effect of the gene-substitution at the marked segment level for QTLs was varied in some ways.
The likelihood according to the number of QTLs, the type of linkage phase, and the individual effects of QTLs is written as
where θ is the vector of the true parameters, QHk represents the kth QTL haplotype, and Ak is the effect for QHk.
Deterministic sampling (Mackinnon and Weller, 1995; Mackinnon et al., 1996; Wright and Kong, 1997; Ronin et al., 1999) was used to evaluate the expected log-likelihoods for the IM, RIM and CM procedures under the true genetic models. The data were deterministically simulated by systematically drawing values of a random variable. Then the number of observations with the value y at the fixed point in the jth marker haplotype group is nP(hj)Lj, since this is equal to the height of the density function at the point y multiplied by the size of the marker haplotype group. Hence, the logL over all the observations with the value y is nP(hj)Lj logˆLj. The expected total log-likelihood of the distribution of observations was obtained by summing over all values of y, as follows:
where nj is the number of offspring having marker haplotype j, Lj is the likelihood conditional on marker haplotype j under the true genetic model, and ˆLj is the likelihood conditional on marker haplotype j under the (R)IM or CM model.
In the deterministic simulation, the size of marker interval was set to 20 cM and the family size to 200. The parameters μ and σ were supposed to be known, setting to 0 and 1, respectively. Then, for the (R)IM methods, the unknown parameters were the position of, and the substitution effect for, a putative QTL. We will show the three-dimensional plots of logL as a function of the two unknown parameters and evaluate the shapes of the likelihood surfaces for the considered cases. For CM, it was allowed that the unknown parameter was only the substitution effect for QTL(s) at the segment level, and the logL profiles as a function of this unknown parameter were investigated. The likelihood for estimating the value(s) of the unknown parameter(s) was calculated for the range of ±0.1 of the true (or expected) value with the increment of 0.01. The numerical integration in equation (8) was conducted for the range of ±7σ, using the subroutine QTRAP (Press et al., 1986).
The log-likelihood surface expected for the IM procedure is shown in Fig. 1 for the true situation where a single QTL with the substitution effect of 0.5 is located in the middle of the marker interval of 20 cM. Note that in this case, the assumption of the IM procedure corresponds well to the true genetic model. As would be expected, the log-likelihood was maximized when the estimate of the QTL position was 10 cM (from the origin) and the estimated substitution effect of the QTL was 0.25, or when the estimates were equal to the true values. It is consequently shown that the IM model is successfully fitted to the true genetic model and is quite valid. For the true setting where the marker interval contains the five equally-spaced QTLs that are in coupling phase and each of which has the same effect with the substitution effect of 0.5 at the segment level, the plot of logL expected for IM is similar to that for the case of the single QTL as shown in Fig. 1. That is, there was observed a clear unimodal global maximum with the estimated location of 10 cM for the putative QTL. Under this true genetic model, the estimation of the QTL effect at the segment level was expected to be accurate, giving that ˆα/2 = 0.25 (data not shown).
![]() View Details | Fig. 1. The profile of the log-likelihood expected with the estimates of QTL-position and -effect by IM for the case where a single QTL with the substitution effect of 0.5 is located in the middle of the marker interval. |
Fig. 2 concerns the true situation where the five equally-spaced QTLs of equal effects are in repulsion phase with the resulting substitution effect of 0.1 at the segment level. For this setting, with the IM approach, there was found a ridge along the marker interval for the likelihood surface. This pattern is obviously different from those in Fig. 1 and the coupling cases. Since the ridge was found to be parallel with the axis of QTL position, it is indicated that there was the independency between the estimates of the QTL position and the QTL effect. For the QTL effect, the estimate was expected to be unbiased with ˆα/2 = 0.05. For the estimation of QTL position, although a slight peak of the likelihood emerged in the center of the marker interval, because of the presence of the ridge, all the positions within the interval could practically be an almost equally possible estimate of the location of the putative QTL. Such an undesirable phenomenon would be due to the fact that the actual substitution effect at the segment level was expected to be smaller in spite that the effect of each QTL was equal to that in the coupling case considered above. With a larger substitution effect of 0.5 at the segment level, the pattern of the likelihood surface for the IM procedure was rather similar to those of the single QTL and the coupling cases as shown in Fig. 1 (results not shown).
![]() View Details | Fig. 2. The profile of the log-likelihood expected with the estimates of QTL-position and -effect by IM for the case where the five equally-spaced QTLs with the substitution effect of 0.1 at the segment level within the marker interval are in repulsion phase. |
The case of a particular linkage phase for the five QTLs in the marker bracket, or the case of (– – + + +), was also examined. With the IM procedure, as depicted in Fig. 3, a ridge was found with the substitution effect of 0.1 for the whole segment, similarly as is in Fig. 2. However, the direction of the ridge was not parallel with ˆd, and there was found a tendency that the estimate of the QTL effect became higher, as the estimated QTL position became a righter-hand side of the interval, indicating a dependent relationship between the two estimates. For the present case, it was expected that the position of the putative QTL was estimated as 20 cM, or the position of the right-side one of the flanking markers, with the apparent effect of ˆα/2 = 0.06, whereas the truly expected effect for the whole segment was 0.05. When the segment-substitution effect was a higher value of 0.5, a more upward bias in the estimated effect was expected, showing that ˆα/2 = 0.29 (Fig. 4). For the RIM method, the maximum values of the log-likelihood were found to be only very slightly to slightly different from the corresponding values for IM.
![]() View Details | Fig. 3. The profile of the log-likelihood expected with the estimates of QTL-position and -effect by IM for the case where the five equally-spaced QTLs with the substitution effect of 0.1 at the segment level within the marker interval are in a specific phase, or (– – + + +). |
![]() View Details | Fig. 4. The profile of the log-likelihood expected with the estimates of QTL-position and -effect by IM for the case where the five equally-spaced QTLs with the substitution effect of 0.5 at the segment level within the marker interval are in a specific phase, or (– – + + +). |
Fig. 5 shows the results for the CM procedure for the selected case where the five equally-spaced QTLs with the substitution effect of 0.5 at the segment level within the marker interval are in a specific phase (– – + + +), together with those for (R)IM in which the QTL position was fixed at the point with the maximum value of the log-likelihood or the right flanking marker. For the true genetic models considered, except for the cases of the particular linkage phase (– – + + +), it was expected that all the three procedures yield the unbiased estimate of the segment-effect. For the cases of (– – + + +), it was revealed that the CM method is expected to give the unbiased estimate of the segment-effect, whereas the (R)IM procedures provide biased estimates.
![]() View Details | Fig. 5. The profiles of the log-likelihood expected with the estimates of QTL-effect by (R)IM and CM in the case where the five equally-spaced QTLs with the substitution effect of 0.5 at the segment level within the marker interval are in a specific phase, or (– – + + +). ●: (R)IM; □: CM. |
Usually, the (R)IM analyses require the assumption of a single QTL residing within one marker interval. However, it was suggested that multiple linked QTLs or a cluster of QTLs may exist in a particular chromosomal region (e.g., Khavkin and Coe, 1997, 1998). Then the single QTL model may not be adequate in such situations. On this point, the current results demonstrate that the parameter estimates by the (R)IM analyses may possibly be biased seriously in certain situations. It has been found that when the (R)IM analyses resulted in quite biased estimation of the QTLs-effect, the putative QTL may be mapped at the position of one of the flanking marker loci. This would be due to the fact that the search for QTL in this study was restricted within the marker interval.
In contrast, it has been suggested that the CM method could be totally robust to estimate the QTLs-effect for the true genetic models of various types. The likelihood function for CM is quite similar to that for the RIM model shown by Kao (2000), except that the current CM assumes the genetic model within the marker interval. Kao (2000) suggested that RIM tends to give estimates with larger mean-squared errors and smaller likelihood-ratio test statistics, and therefore it is less powerful in QTL detection, when compared with the likelihood-based IM methods. Kao (2000) also found that such differences between the IM and RIM methods become larger, as the proportion of the variance explained by the QTL becomes higher, the marker interval becomes wider, and the QTL position moves more from the boundary to the middle of the interval. In the current study, it was found that the maximum log-likelihood is expected to be slightly larger for IM than for RIM and CM (results not shown), agreeing with the finding of Kao (2000). However, the difference in the maximum value between these methods would be not large, as long as the effect of the QTLs is not extremely large. While Kao (2000) used the operational model that exactly described the true situation where a single QTL exists within the marker interval, for the true situation where the marker interval contains multiple linked QTLs, we here found that the influences of using an inappropriate operational model on the QTL mapping are similar for IM and RIM. That is, in such cases, both the procedures would similarly give biased estimates of the parameters. It appears that these biases in the estimates are caused by use of the single QTL model against the existence of multiple linked QTLs in the marker interval rather than by the methodological difference between the maximum likelihood and the regression procedures. The patterns of the likelihood surfaces examined in this study indicate that there exists the dependency between the estimates of the position of, and the effect for, the putative QTL. Thus, it is quite likely that the occurrence of the bias mentioned above is associated with such dependency. Whittaker et al. (1996) stated that the estimated QTL position is represented by the ‘center of gravity’ of loci within the interval that affect the trait. This is an interesting statement, and our results may suggest a possibility that a region not involving the QTLs becomes the ‘center of gravity’.
The mapping studies under the genetic situation in which multiple linked QTLs contribute to the genetic variance showed that it is difficult to distinguish between the effect due to the single major QTL and the polygenic effects (Visscher and Haley, 1996; Liu and Dekkers, 1998). If the aim is for marker-assisted selection (MAS), the benefit due to discriminating each between multiple linked QTLs existing in one interval might not be large. For two flanking markers, there are only four haplotypes in the offspring, even if the sire is double heterozygous, although m QTLs have 2m possible genotypes. Thus, flanking markers may not provide the sufficient information about the multiple QTLs-genotypes. Moreover, loci residing within a segment of limited length would weakly be rearranged during meiosis, and then this would cause the difficulty in using recombinant-gametes to select for superior QTLs-genotypes. For such cases, modeling multiple linked QTLs as a single putative QTL could lead to selection for a ghost QTL, as demonstrated in this study, and the resulting MAS could lead to a (serious) loss of genetic gain according to the error in the estimate of the QTL position (Spelman and van Arendonk, 1997). Hence, for the purpose of MAS, the CM method considered here appears to be useful, since it is expected with this method that the unbiased estimate for QTL(s)-effect is provided.
If the objective is mapping QTLs rather than the application to MAS, detecting QTLs individually or the mapping with high resolution is obviously important. For this point, a mapping method using genetic algorithm with which multiple QTLs could be detected separately was proposed recently by Nakamichi et al. (2001). Furthermore, fine-mapping was examined by Meuwissen and Goddard (2000). The investigation on discriminating between modes of inheritance for a single QTL or a cluster of QTLs is also important for detected genetic variation (Visscher and Haley, 1996; Liu and Dekkers, 1998).
We should finally comment that this study was for a single half-sib family and the specific region of the genome with fixed QTL(s)-effect. For whole genome studies in outbred populations with complex pedigrees, further investigation would be necessary to assess in details the validities of the QTL interval mapping (Grignola et al., 1996, 1997; van Arendonk et al., 1998; George et al., 2000) and the QTL-cluster mapping using the mixed model methodology.