2020 Volume 70 Issue 5 Pages 605-616
Non-additive (dominance and epistasis) effects have remarkable influences on hybrid performance, e.g., via heterosis. Nevertheless, only additive effects are often considered in genomic predictions (GP). In this study, we demonstrated the importance of dominance effects in the prediction of hybrid performance in bioenergy sorghum [Sorghum bicolor (L.) Moench]. The dataset contained more than 400 hybrids between 200 inbred lines and two testers. The hybrids exhibited considerable heterosis in culm length and fresh weight, and the degree of heterosis was consistent with the genetic distance from the corresponding tester. The degree of heterosis was further different among subpopulations. Conversely, Brix exhibited limited heterosis. Regarding GP, we examined three statistical models and four training dataset types. In most of the dataset types, genomic best linear unbiased prediction (GBLUP) with additive effects had lower prediction accuracy than GBLUP with additive and dominance effects (GBLUP-AD) and Gaussian kernel regression (GK). The superiority of GBLUP-AD and GK depended on the level of dominance variance, which was high for culm length and fresh weight, and low for Brix. Considering subpopulations, the influence of dominance was more complex. Our findings highlight the importance of considering dominance effects in GP models for sorghum hybrid breeding.
In hybrid breeding, numerous test-crosses are evaluated for the selection of superior F1 hybrids. The crossing of parental lines and the evaluation of the hybrids are time-consuming, which are major limitations of traditional breeding approaches. Genomic prediction (GP), in which genotypic values of plants and animals are estimated based on genome-wide marker genotypes (Meuwissen et al. 2001), could accelerate hybrid breeding. GP has been validated in many hybrid crops, and the influence of various factors was evaluated, e.g., the relevance of training datasets and statistical models (Zhao et al. 2015). Particularly, genetic relatedness between training and validation datasets influences GPs of hybrid performance (Gowda et al. 2014, Liu et al. 2016, Wang et al. 2014). Nevertheless, the importance of these factors has not been demonstrated theoretically in hybrid breeding, i.e., heterosis has not been taken into account in GP of hybrid performance.
The expression of heterosis is explained by non-additive genetic effects, which consist of dominance and epistasis (Ukai 2002). Therefore, the modeling of non-additive effects seems important, particularly for traits exhibiting heterosis. Genomic best linear unbiased prediction (GBLUP) is a linear GP model (VanRaden 2008) that has been most extensively applied in various animals and plants. GBLUP can incorporate both additive and dominance effects into the model explicitly (Nishio and Satoh 2014). Conversely, Gaussian kernel regression (GK) is a nonlinear model that can implicitly model non-additive effects as well as additive effects (de los Campos et al. 2010, Gianola et al. 2006).
Previous studies have highlighted the merits of modeling dominance effects explicitly (Denis and Bouvet 2013, Nishio and Satoh 2014, Sun et al. 2013, Technow et al. 2012), while several challenges have also been reported for dominance modeling. The prediction accuracy of a model with dominance effects is sensitive to the sizes of the training datasets (Zhao et al. 2013). In addition, according to Liu et al. (2017), larger training datasets are required for the modeling of non-additive effects than datasets required for the modeling of additive effects. These studies demonstrate that a large number of hybrids could be required for the validation of the potential of models with dominance effects. Furthermore, the degree of dominance variation could influence the effects of modeling dominance (Varona et al. 2018).
In the present study, we reveal the potential of GP models considering dominance effects in hybrid crops, by comparing three models (GBLUP with additive effects [GBLUP-A], GBLUP with additive and dominance effects [GBLUP-AD], and Gaussian kernel regression [GK]). We analyzed a dataset of sorghum [Sorghum bicolor (L.) Moench]. Sorghum has recently been identified as a promising bioenergy feedstock (Brenton et al. 2016, Mathur et al. 2017, Mullet et al. 2014). Sorghum is historically classified into a few races (i.e., bicolor, caudatum, durra, guinea, and kafir). Recently, Sapkota et al. (2020) showed that racial structure can affect GP accuracy in sorghum inbred accessions. Although the classification was mainly based on phenotypes, the reclassification was advanced based on genetic relatedness (Brown et al. 2011). The dataset included more than 400 F1 hybrid lines, which were derived by crossing 200 inbred accessions with two testers. Assuming the training dataset influences the modeling of dominance effects significantly, we examined multiple types of training datasets. The degree of heterosis greatly depended on traits and subpopulations, which also influenced the superiority of models. Finally, we discuss the significance of specific statistical models and training datasets in the modeling of dominance effects.
F1 hybrids were generated from 240 sorghum inbred accessions, which are available from public germplasms (Supplemental Table 1). The passport data includes information of the origin (country), but not the racial classification. We refer to the inbred accessions as P2 lines. The P2 lines were crossed with two testers, which are designated as P1A and P1B. Both P1A and P1B were provided by EARTHNOTE Co., Ltd. (Okinawa, Japan) as inbred lines with cytoplasmic male sterility, which have generally been used for commercial F1 seed production in sorghum. The derived hybrids are classified into two groups, F1A and F1B, which were derived from P1A and P1B, respectively. In total, 422 hybrids (200 and 222 hybrids of F1A and F1B, respectively) were used for GP.
The present study aimed to facilitate the development of hybrid sorghum varieties, which are supposed to be cultivated in marginal lands for biofuel production. Field experiments were performed in Mexico (25°37ʹ N, 108°43ʹ W) for 3 years (from 2013 to 2015). The seedlings of two testers, P2 lines, and the hybrids were grown in a greenhouse for 3 weeks and transplanted into the experimental field in July. We fertilized the field using N:P:K = 17:60:50 (kg ha–1) fertilizer before the trials in each year. We had multiple replicate blocks each year. Adult plants were harvested and measured in October. We evaluated three traits (culm juice Brix, culm length, and fresh weight), which are essential for bioenergy sorghum.
To estimate the genotypic values of P2 lines and their hybrids, we applied the following linear mixed model:
where yijk is the phenotypic value of the ith genotype on the kth block in the jth year, μ is the overall mean value, gi is the random effect of the ith genotype, rj is the fixed effect of the jth year, bjk is the random effect of the kth block in the jth year, girj is the interaction term between the ith genotype and the jth year as the random effect, gibjk is the interaction term between the ith genotype and the kth block in the jth year as the random effect, and eijk is the residual. The genotypic value of the ith genotype
Heterosis is a key factor considered in hybrid breeding. The degree of heterosis can be evaluated based on mid-parent (MP) and high-parent (HP) heterosis. The values were calculated using the following equations:
where
The marker genotypes of 242 inbred accessions (including two testers) were obtained by restriction site-associated DNA sequencing (Baird et al. 2008) using the Illumina HiSeq 2000 System. The raw sequence data is available in BioProject ID PRJDB4150 and PRJNA655762. The detailed procedures of DNA extraction and library preparation have been described previously by Kobayashi et al. (2017). Sequenced reads were mapped to the sorghum reference genome sequence (Sbicolor_313_v3.0) (McCormick et al. 2018). We used Burrows-Wheeler Aligner (Li and Durbin 2009) for genome mapping and the Genome Analysis Toolkit (McKenna et al. 2010) for variant calling.
The output file from the variant calling included 2,884,062 markers. We used VCFtools (Danecek et al. 2011) and the custom scripts to select high-quality markers using the following criteria: read depth range (3–30), minor allele frequency (not less than 5%), the proportion of missing data (not more than 5%), quality value (not less than 20), heterozygosity rate (not more than 5%), and bi-allelic single nucleotide polymorphisms. Missing genotypes were imputed using Beagle 4.0 (Browning and Browning 2007). Subsequently, we filtered out (1) markers that were heterozygous in either P1A or P1B and (2) markers under high linkage disequilibrium (r2 ≥ 0.95 between two markers) on the same chromosome (retaining only the first marker). Finally, we obtained 6,619 highly reliable markers for the subsequent analyses and deduced the marker genotypes of F1 hybrids from the parents (i.e., a P2 line and a tester).
Using ADMIXTURE (Alexander et al. 2009), we estimated ancestry fractions of sorghum inbred lines with k (the number of subpopulations) = 2–9. Considering the previous report (Brown et al. 2011), we adopted k = 4 for the following analyses. We classified sorghum inbred lines (including two testers) into four subpopulations (A1–A4) which were determined based on the most proportion of ancestry fractions (e.g., an accession is classified into A1 when the proportion is A1: 40%, A2: 30%, A3: 15%, A4: 15%). In this study, mixed categories were not considered for subpopulations. In F1 hybrids, subpopulations were referred to based on the parental P2 lines (e.g., we treated an F1 hybrid as A1 when the P2 line is A1 regardless of the subpopulation of testers). To visualize the genetic relatedness between P2 lines and the two testers, we performed principal component analysis on the marker genotype data using the “prcomp” function in R.
GP modelsWe examined three GP models: (1) GBLUP-A, (2) GBLUP-AD, and (3) GK. All models were implemented using the R package “EMMREML” (Akdemir and Godfrey 2015).
GBLUP-A is described using the following formula (VanRaden 2008):
where y is a vector of genotypic values estimated from phenotypic values, μ is a mean value, Z is a design matrix for additive effects, a is a vector of additive effects of genotypes, and e is a vector of residuals [where
where A is an additive relationship matrix (Endelman and Jannink 2012). A is calculated using the following formula when A1j and A2j are two types of alleles at the jth marker site, and pj is the allele frequency of A2j:
where Ma is a matrix of n × Nm (n and Nm are the numbers of genotypes and markers, respectively). The factor of Ma of the ith genotype at the jth marker is determined using the following formula:
GBLUP-AD is described using the following formula (Nishio and Satoh 2014):
where Z is a design matrix for dominance effects, and d is a vector of dominance effects of genotypes. The dominance effect d is assumed to have a Gaussian distribution:
where D is a dominance genomic relationship matrix, described as follows:
where Md is a matrix of n × Nm. The Md factor of the ith genotype at the jth marker follows the next formula:
In GBLUP-AD, three genotypes (A1A1, A1A2, and A2A2) at each marker were required to estimate a and d explicitly, which can be confounded when only two genotypes are included in the training dataset.
GK is described using the following formulas (de los Campos et al. 2010, Gianola et al. 2006):
where u is a random genetic effect vector with a Gaussian distribution:
The Gaussian kernel K follows the next formula:
where h is the bandwidth parameter for controlling genetic covariance, and S is a square matrix of n × n. In S, the factor on the ith row and the jth column is a standardized value of the Euclidean distance between the ith genotype and the jth genotype:
where M is a matrix of n genotypes × Nm markers, described as [–1, 0, 1]. We set h = 1/q, where q is the median squared value of S (computed using off-diagonals only). S was also used as the genetic distance from the testers.
Training dataset typesEach hybrid group (F1A and F1B) was used for the validation of GP accuracy (Fig. 1). We designated the validated hybrid group as “Target”, whereas another (not validated) hybrid group was designated as “Non-target”. We examined four types of training datasets named Type-TN, Type-TI, Type-NI, and Type-TNI. In Type-TN, both “Target” and “Non-target” hybrid groups were included in the training dataset. In Type-TI, “Target” and P2 lines were used as the training dataset. Conversely, in Type-NI, “Non-target” and P2 lines were used for training. In Type-TNI, “Target”, “Non-target”, and P2 lines were all included in the training dataset.
Types of training datasets used for genomic prediction. The accuracy of genomic prediction (GP) was evaluated using the cross-validation approach for a target hybrid group (F1A or F1B). The four types of training datasets were different combinations of F1A, F1B, and paternal parent lines (P2). We described the hybrid group used for validation as the “Target”, whereas the other hybrid group was the “Non-target”. For example, in Type-TI (Type-Target and Inbred), F1A, together with P2, was used for training GP models when F1A was the target hybrid group. In contrast, in Type-NI (Type-Non-target and Inbred), F1B, together with P2, was used for training GP models when F1A was the target hybrid group.
GP was performed with three approaches: (1) all-subpopulations, (2) within-subpopulation, and (3) across-subpopulation (Supplemental Fig. 1). In the first approach, both training and validation datasets included all four subpopulations. In the second approach, both training and validation datasets always comprised a subpopulation and the prediction was performed for each subpopulation. In the third approach, the validation datasets comprised a subpopulation, whereas the training datasets included the other subpopulations which were not used for validation.
In the first and second approaches, the GP accuracy was evaluated via 10-fold cross-validation (CV). The validation dataset was divided randomly into 10 subsets. Nine subsets were used for training a GP model, whereas the remaining one was used for validating the model. The process was repeated 10 times until all subsets were validated. In the 10-fold CV, genotypes related to “Target” hybrids that were included in a validation subset were not used for the training. For example, P2C (a P2 line) and P1A × P2C are the related genotypes when P1B × P2C is included in a validation subset.
In the third approach, a different evaluation method was adopted because the training and validation datasets were derived from different subpopulations. For each prediction, we randomly selected 90% of the genotypes from the total training dataset size (i.e., 90 genotypes were randomly selected when the total dataset included 100 genotypes from three subpopulations for training). All target hybrids of the validated subpopulation were predicted in each prediction replicate.
Prediction accuracy was the correlation coefficient between genotypic values
To dissect the relationships among GP accuracy, models, and genetic architecture, we estimated the genetic variances of the target traits. In addition, to estimate the genetic variances, we considered the following four group combinations: “F1A and F1B”, “F1A and P2”, “F1B and P2”, and “F1A, F1B, and P2”. The four combinations correspond to the four types of training datasets in GP (Type-TN, Type-TI or Type-NI, and Type-TNI). Unlike the 10-fold CV for the evaluation of GP accuracy, we simultaneously fitted all genotypes belonging to each combination to a model, and obtained the genetic variance estimates
To analyze the genetic relationships among inbred lines (P1A, P1B, and 240 P2 lines), we examined the extent of genetic admixture using some subpopulation values (k = 2–9). The inbred lines were classified into distinct groups, whereas some accessions were considerably admixed (Fig. 2A). Because Brown et al. (2011) indicated that the number of subpopulations k = 4 better reflected a sorghum racial classification, we also used a classification of four subpopulations (A1–A4) based on the result with k = 4 for the following analyses.
Genetic variations in sorghum inbred accessions. (A) Ancestry fractions with k (the number of subpopulations) = 2–9. (B) Principal component analysis of marker genotype data. P1A and P1B are the maternal parents (testers) of two hybrid groups, F1A and F1B, respectively. P2 lines are the paternal parents of the two hybrid groups. The subpopulations are classified with k = 4.
Principal component analysis on marker genotype data represented the genetic differences among four subpopulations, which were sufficiently explained by PC1–4 (Fig. 2B). P1A and P1B belonged to different subpopulations (A1 and A3, respectively), indicating the distant relatedness.
Phenotypic variations of P2 lines and the F1 hybridsWe calculated adjusted phenotypic values of P2 lines and the F1 hybrids across 3 years with a linear mixed model (Supplemental Fig. 2). Both F1 hybrids (F1A and F1B), which were each derived from P1A and P1B (as the tester), represented higher values than P2 lines did in three traits (Brix, culm length, and fresh weight) (Fig. 3A). The phenotypic differences among the four subpopulations (based on the parental P2 line in F1A and F1B) were also considerably observed (e.g., A1 lines averagely exhibited shorter culms than A2 lines did across P2, F1A, and F1B).
Phenotypic differences among subpopulations and phenotypic correlations among P2 lines and their F1 hybrids. Adjusted phenotypic values are shown. (A) Phenotypic variations in the three traits in the P2, F1A, and F1B groups. Lowercase letters (a, b, and c) above box plots indicate significant differences among the four subpopulations by the Tukey’s test (P < 0.01). Uppercase letters (A and B) indicate significant differences among P2, F1A, and F1B groups in each trait (P < 0.01). Blue dashed lines represent phenotypic values of P1A, whereas red dotted lines correspond to P1B. (B) Phenotypic correlations among paternal parent lines (P2) and their hybrids (F1A and F1B). Yellow and light green solid lines represent the boundaries of high-parent and mid-parent heterosis, respectively.
We examined phenotypic correlations based on parental P2 lines among the three groups (Fig. 3B). F1A and P2 exhibited high phenotypic correlations across the three traits (Brix, 0.70; culm length, 0.61; fresh weight, 0.76). F1B also exhibited moderate phenotypic correlations with P2 with regard to culm juice Brix (0.60) and culm length (0.44), whereas the phenotypic correlation of fresh weight was low (0.21). Numerous hybrids in F1A and F1B exhibited HP heterosis in culm length and fresh weight, whereas limited hybrids exhibited heterosis in culm juice Brix. The phenotypic correlations between F1A and F1B from the same P2 line were low with regard to culm length (0.11) and fresh weight (0.18) and were relatively high with regard to culm juice Brix (0.68).
Heterosis in F1 hybridsSubsequently, we calculated the degree of HP heterosis in each hybrid group. Along the genetic distance from each tester, the two hybrid groups exhibited considerable HP heterosis in culm length and fresh weight (Fig. 4A). In culm juice Brix, positive HP heterosis was not observed in both hybrid groups. Furthermore, the directions of the relationships between the genetic distances from testers and the degrees of heterosis between F1A and F1B were different, being negative and positive, respectively.
Influence of genetic distance and subpopulation on HP heterosis. (A) The relationship between genetic distance from the tester and the degree of HP heterosis. The genetic distance from testers was estimated as a standardized value of the Euclidean distance in the marker genotypes, S. The regression curves, which are shown as yellow solid curves with the standard errors (gray regions), were calculated by locally estimated scatterplot smoothing. (B) HP heterosis in each subpopulation. Lowercase letters (a, b, and c) above box plots indicate significant differences among the four subpopulations by the Tukey’s test (P < 0.01). Uppercase letters (A and B) indicate significant differences between the two tester groups (i.e., F1A and F1B) in each trait (P < 0.01).
The degree of HP heterosis was different between the two F1 groups, which was large using P1B as the tester in Brix and culm length (Fig. 4B). Furthermore, the difference in subpopulations affected HP heterosis in both F1 groups. Using P1A as the tester, A3 exhibited larger HP heterosis in culm length and fresh weight than did the other hybrids. In contrast, HP heterosis was relatively large in A1 and A4 using P1B as the tester.
GP accuracy in the four types of training datasetsFirst, we performed GP using the all-subpopulations approach. In Type-TN, the accuracy of GBLUP-AD and GK was always higher than the accuracy of GBLUP-A across all three traits (Fig. 5). Furthermore, GBLUP-AD had higher prediction accuracies than GK in culm length. In Type-TI, GBLUP-AD had much greater accuracy than GBLUP-A, although the accuracy of GK was superior to that of GBLUP-AD. In contrast, GBLUP-A outperformed GBLUP-AD and GK under Type-NI in culm length and fresh weight. In Type-TNI, GBLUP-AD had a higher predictive ability than GBLUP-A, except when targeting fresh weight under F1A. Similarly, GK outperformed GBLUP-A with Type-TNI in all traits. Across the four training types, the prediction accuracies of GBLUP-AD and GK were sometimes comparable with regard to culm length and fresh weight, whereas GK always had higher prediction accuracies than those of GBLUP-AD in culm juice Brix.
Genomic prediction (GP) using the all-subpopulations approach. Type-TN, the target and non-target hybrid groups; Type-TI, the target hybrid group and paternal parent P2 lines; Type-NI, the non-target hybrid group and P2; Type-TNI, the target and non-target hybrid groups and P2. GP accuracy was evaluated by the correlation coefficient (r) between genotypic values estimated from phenotypic data and genotypic values predicted from the marker genotype data. The values of GP accuracy are represented by the means of 100 replicates with standard error bars. Lowercase letters (a, b, and c) above bar plots indicate significant differences among the three models (GBLUP-A, GBLUP-AD, and GK) by the Tukey’s test (P < 0.01). GBLUP-A: Genomic best linear unbiased prediction with additive effects; GBLUP-AD: GBLUP with additive and dominance effects; GK: nonlinear Gaussian kernel regression.
The results of the all-subpopulations approach were summarized in each population (Supplemental Figs. 3, 4). The model performances were generally different among subpopulations, and the superiorities of GBLUP-AD to GBLUP-A were mostly specific to subpopulations (e.g., in Brix, GBLUP-A represented higher accuracies for A1, whereas GBLUP-AD was generally better for A4).
Second, we predicted the performance in each subpopulation (the within-subpopulation approach). The prediction accuracies were considerably different among subpopulations, traits, and testers (Supplemental Figs. 5, 6). For the subpopulation A4, GBLUP-A represented lower accuracies than GBLUP-AD and GK in some cases (e.g., Brix in F1A and culm length in F1B). For the other subpopulations, there were no universally superior GP models across different subpopulations and training dataset types.
Third, each subpopulation was predicted with the training datasets from different subpopulations (the across-subpopulation approach). Low prediction accuracies regardless of the GP models were observed in many cases (particularly, in F1B) (Supplemental Figs. 7, 8). Nevertheless, GBLUP-AD represented better accuracies than did GBLUP-A in some cases (e.g., Brix in A4 across the two F1 groups).
The comparison between GBLUP-A and GBLUP-AD across the three approaches is summarized in Fig. 6. The superiority of GBLUP-AD to GBLUP-A was found across the three approaches, which depended on the relationships among traits, training dataset types, F1 groups, and subpopulations. Specifically, using Type-TN in Brix, the superiority of GBLUP-AD was observed with the all-subpopulations approach, whereas the differences between the two models were not substantially in both within- and across-subpopulation approaches. We also summarized the performance of the GK model (Supplemental Fig. 9). The superiority of the GK model was found across the three approaches. However, superiority was limited to specific training dataset types or subpopulations when the across-subpopulation approach was used (e.g., in Brix, Type-TN, or A3).
Comparison between GBLUP-A and GBLUP-AD across the three GP approaches. The predictions in which GBLUP-A outperformed GBLUP-AD in accuracy are labeled with red, whereas blue labels represent cases where GBLUP-AD is better. The background color indicates that the difference between GBLUP-A and GBLUP-AD is not significant.
To further characterize the GP models, we estimated genetic variance based on four datasets comprising multiple groups, which corresponded to the four types of training datasets. The total genetic variance varied among datasets even though the same model was used for the estimation (Fig. 7). Although the additive and dominance variances were estimated explicitly using only GBLUP-AD, the ratios of dominance variance depended on the datasets. When using both F1A and F1B as the datasets across all subpopulations, the dominance variance accounted for more than half of the genetic variance in culm length and fresh weight. Conversely, the dataset with P2 and either F1A or F1B varied considerably from the dataset with P2 and another F1 group. In Brix, the degree of the dominance variance was relatively low for any dataset.
Estimated genetic variance in additive, dominance, and non-linear effects
Each subpopulation exhibited different characteristics in the genetic variance estimates. Within A1, the total genetic variance was small when using the dataset with F1A and F1B in any trait. In A2, the total genetic variance in culm length and fresh weight was smaller than that in the other subpopulations, whereas it was large in Brix. In A3, the dominance variance was highly limited in Brix even though all groups (F1A, F1B, and P2) were used for the estimation. On the contrary, in A4, the proportion of dominance was considerably large in Brix, whereas the estimates in culm length and fresh weight changed with different datasets.
GP is a promising approach for evaluating the performance of numerous hybrids, which would be arduous and time-consuming when performed using traditional breeding techniques (Zhao et al. 2015). The prediction of heterosis is critical for the selection of superior hybrids. We analyzed more than 400 sorghum hybrids, which exhibited significant levels of heterosis (Fig. 3). Subpopulations affected HP heterosis in addition to the genetic distance from testers (Fig. 4). Particularly, culm length and fresh weight exhibited HP heterosis across the two hybrid groups evaluated. Based on the classical theory of quantitative genetics, heterosis is explained by non-additive genetic effects, such as dominance and epistasis (Ukai 2002). Dominance effects contribute to heterosis in numerous agronomic traits (Frascaroli et al. 2007, Li et al. 2008, Radoev et al. 2008, Xiao et al. 1995, Yang et al. 2017), indicating that capturing dominance effects is essential for GP of hybrid performance.
We examined the influence of dominance effects on GP based on three models: GBLUP-A, GBLUP-AD, and GK. Dominance effects were incorporated explicitly into GBLUP-AD (Nishio and Satoh 2014), and implicitly into GK as part of non-linear effects (Gianola et al. 2006, Gianola and van Kaam 2008). Using the all-subpopulations approach, GBLUP-AD and GK generally had higher prediction accuracy than GBLUP-A, excluding Type-NI (Fig. 5). The lower accuracy of GBLUP-A suggests the importance of non-additive effects in the GP of hybrid performance. Technow et al. (2012) also demonstrated that GP accuracy could be improved by incorporating dominance effects into models. The results reveal that GP models based on only additive effects are not appropriate for the prediction of hybrid performance.
In some cases, GBLUP-AD had greater GP accuracy than GK in culm length and fresh weight. The results were generally consistent with the size of the dominance variance. In other words, the explicit modeling of dominance effects is more effective than implicit modeling, particularly when the dominance variance is large (Fig. 7). Because the merits of explicit modeling of dominance depend on the genetic architecture of a target trait (Nishio and Satoh 2014), the difference in GP accuracy among models can be small when the trait has no dominance variation. In fact, disregarding the dominance effects could increase GP accuracy in wheat (Zhao et al. 2013). However, additive models are inappropriate in the GP of hybrid performance because heterosis prediction is essential for the selection of superior hybrids. The modeling of non-additive effects should be considered in hybrid breeding despite the limited improvement in GP accuracy (Varona et al. 2018).
In the Type-NI dataset, both GBLUP-AD and GK had lower GP accuracy than GBLUP-A in culm length and fresh weight (Fig. 5). The results suggest that modeling dominance effects, whether explicitly or implicitly, is not suitable for specific training datasets. The phenotypic correlations in culm length and fresh weight between a pair of hybrids derived from the same P2 line were low (Fig. 3B), indicating that the genetic architecture of the traits was quite different between the two hybrid groups. The results imply that segregated quantitative trait loci between the two testers could have considerable dominance effects. The level of genetic variance and the ratio of dominance varied considerably between the two datasets, “F1A and P2” and “F1B and P2” (Fig. 7), which is consistent with the hypothesis.
In addition to testers, the genetic differentiation among P2 lines, which were summarized by subpopulations, also affected the estimates of genetic variances. We found that the genetic variances reflected phenotypic variations and HP heterosis of each subpopulation to some extent. For example, the genetic variances of culm length and fresh weight were relatively small in A2 using any dataset type, whereas both phenotypic variations and the degree of HP heterosis were also small (Figs. 3, 4). In contrast, the relationships among the dominance variance, HP heterosis, and dataset types were not necessarily straightforward. Although the dominance variance considerably contributed to A4 with any dataset (except culm length with the F1A and F1B datasets), HP heterosis was limited with the tester P1A. These results suggested that the influence of subpopulations on the modeling of dominance needs to be carefully considered in each dataset.
The sorghum subpopulations based on genotypes reflect most historical classes and more precise relatedness (Brown et al. 2011). The results of our principal component analysis were very similar to these results (Fig. 2B), indicating that subpopulations are meaningful across diverse sorghum accessions. The comparisons of GP models were quite complex when subpopulations were considered in the all-subpopulations approach or the two approaches considering subpopulations (within-subpopulation and across-subpopulation) were used. Different from the validation that did not consider subpopulations, the superiority of GBLUP-AD over GBLUP-A was limited to specific cases (Fig. 6). As mentioned earlier, the degree of HP heterosis might not be fully related to the merit of dominance GP models. In the across-subpopulation approach, the influence of population structure might be rather larger than that in GP models because the prediction accuracy was much lower than with the other approaches in some cases (Supplemental Figs. 7, 8). The prediction accuracy can decline owing to a small training dataset size or population structure (Sapkota et al. 2020, Technow et al. 2013). Particularly, larger datasets are necessary for the modeling of dominance (Zhao et al. 2013). The effect of heterotic groups should also be considered for maximizing prediction accuracy, such as that observed in maize (Technow et al. 2014).
Heterosis is positively correlated with genetic distance (Betrán et al. 2003, Krystkowiak et al. 2009). Our results also revealed the positive relationships between genetic distance from the tester and the degree of HP heterosis in the two traits (Fig. 4A). Furthermore, the degree of heterosis was different among subpopulations even though the parental lines had similar genetic distances from the testers (Fig. 4B). These results suggest that heterotic groups should be considered for maximizing the performance of F1 hybrid sorghums. Unfortunately, the relationship between GP models and training dataset types was quite complex when subpopulations were considered (Supplemental Figs. 3–8). Further studies are needed to incorporate the influence of heterotic groups into GP models. The use of combined training datasets (e.g., the all-subpopulations approach) may currently be promising for general breeding programs (Sapkota et al. 2020, Technow et al. 2013). According to these results, the relevance of statistical models, the design of training datasets, and population structure should be taken into account in GP of hybrids rather than relying on individual factors.
Regarding culm juice Brix, GK with the all-subpopulations approach always outperformed not only GBLUP-A but also GBLUP-AD (Fig. 5), suggesting the contribution of non-additive effects other than dominance, i.e., epistasis. In sorghum, epistatic effects contribute to sugar content-related traits (Ritter et al. 2008). Because epistatic effects can influence the expression of heterosis (Jiang et al. 2017), the modeling of epistasis is also considered in GP (Jiang and Reif 2015). However, our results indicated that GK might be unsuitable for the across-subpopulation approach (Supplemental Fig. 9). Although the balance between prediction accuracy and overfitting is not necessarily straightforward in GP, GK can be more overfitting than other GP models (Heslot et al. 2012). The performance of the GK model may depend on some related factors, such as training datasets and population structure.
The contributions of additive, dominance, and epistatic components generally vary among traits, indicating that traits determine the appropriate models for predicting hybrid performance. Recent studies have demonstrated that the superiority of GP models for hybrid performance varies among traits (Alves et al. 2019, Kadam and Lorenz 2019). One model may not be suitable for all traits, although Zhao et al. (2015) argued that the model choice had no substantial effect on GP accuracy. Therefore, an optimal GP model should be selected empirically for each target trait, in addition to adopting appropriate designs for training datasets and population structure.
MI, TT, TF, NT, and HI designed the study; MI, TH, KY, HT, MF, JY, TF, NT, and HI cultivated and measured phenotypic data; MI and HK-K prepared the RAD-seq data; MI and TH analyzed data; MI interpreted results and drafted the manuscript; NT and HI supervised the study. All authors read and approved the final manuscript.
The research was funded by CREST, Japan Science and Technology Agency, Japan (research subject: “Novel techniques of tailor-made breeding for energy crop improvement using high-throughput genotyping”) and was partially supported by the Japan Society for the Promotion of Science KAKENHI (Grant 17H01457). The authors appreciate the technical assistance of the members of the Laboratory of Biometry and Bioinformatics and the Laboratory of Plant Molecular Genetics at the University of Tokyo.