Breeding Science
Online ISSN : 1347-3735
Print ISSN : 1344-7610
ISSN-L : 1344-7610
Research Papers
Optimal set of microsatellite markers required to detect illegitimate progenies in selected oil palm (Elaeis guineensis Jacq.) breeding crosses
Siti Hazirah ZolkafliMaizura IthninKuang-Lim ChanMohd Isa Zainol AbidinIsmanizan IsmailNgoot Chin TingLeslie Cheng-Li OoiRajinder Singh
著者情報
ジャーナル フリー HTML
電子付録

2021 年 71 巻 2 号 p. 253-260

詳細
Abstract

Oil palm is continually being improved via controlled crossing of selected palms to ensure sustainable yields and productivity. As such, correct parental assignment is important as the presence of illegitimates will compromise the progress of improvement. In the present study, we determined the optimal number of microsatellite (SSR) markers for detection of illegitimates in selected oil palm crosses with high confidence. Determining the optimal number of markers to assign parentage will ensure that the DNA fingerprinting will be cost effective for routine use as a quality control tool in oil palm improvement programs. Here, we evaluated a wide range of crosses that included a cross derived from wild germplasm palm. The results revealed that markers with high PIC are informative and detect most of the alleles present in a cross, including those exhibited by the illegitimates. A larger number of optimum sets of markers are needed to detect all illegitimates for crosses with higher levels of genetic diversity. The optimal number of polymorphic SSR markers determined in the present study can ensure that appropriate quality control is implemented for oil palm improvement programs.

Introduction

The oil palm (Elaeis guineensis) is insect-pollinated broad-leaved tree species and the most productive oil crop, giving 3–8 times the yield of other crops (Barcelos et al. 2015). In 2016, the oil palm area in Malaysia was 5.7 million hectares (Kushairi et al. 2018), making it the premier crop in the country. The commercially planted oil palm is the tenera (T), which has thin-shelled fruits. It is produced by crossing dura (D) and pisifera (P). The main commercial product from the crop is the oil from the fleshy fruit mesocarp, and a thicker shell detracts from the amount of mesocarp bourned. The percentage of mesocarp in the dura fruit is 60% and in pisifera 90%. It would seem that pisifera is the fruit type to plant, but it is largely female sterile, that is, it hardly bears fruits. Thus, to produce tenera, pisifera is used as the male parent and dura the mother palm, although in the rare successful reciprocal cross the tenera produced is as good. Tenera has an intermediate mesocarp content of ~80% and is used for the commercial cultivation. The thick shell duras used in breeding programs and commercial seed production in Southeast Asia originated from Deli, a province in Sumatera. As for the pisiferas, oil palm breeders worked on genetic materials from wider sources such as AVROS (Algemeene Vereniging van Rubberplanters ter Oostkust van Sumatra), Ekona, La Me and Yangambi which are also being utilized in West African countries such as Zaire, Ivory Coast, and Nigeria.

In oil palm breeding, selected oil palms that fulfil specific criteria are routinely crossed in a stringent controlled pollination to ensure no contamination of the process by stray pollen. But, despite the stringent procedure, there will be the occasional slip-ups (Chee et al. 2015, Hama-Ali and Tan 2014, Hama-Ali et al. 2015, Thongthawee et al. 2010), for example, torn bags to allow in other pollen (Corley 2005), biology of the oil palm flower and simply, human error (Budiman et al. 2019). Similar problems related to maintaining quality control of crosses has also been reported for cocoa (Padi et al. 2015), maritime pine (Plomion et al. 2001), loblolly pine (Grattapaglia et al. 2014) and Scot’s pine (Pinus sylvestris L.) (Torimaru et al. 2009). Obviously, any illegitimate individuals will compromise the selection and production process. In cacao for example, the presence of illegitimates significantly altered the estimated heritability of selected traits (Duval et al. 2017). It is therefore essential to exclude illegitimates from controlled crosses. The question begged is how best to do it. In oil palm commercial seed production, the fruit form can be used to detect illegitimacy (Corley 2005). As only the thin-shelled tenera is expected from D×P crosses, any dura and pisifera would indicate illegitimacy. In addition, the fruit form can only be discerned when the palm fruits, some 3 to 4 years after field planting. Much time and resources would by then have been wasted. It would be better if earlier detection is possible.

Molecular markers offer a practical solution—any illegitimate palm will show a different allelic profile to their purported parents (Corley 2005). Illegitimate testing can therefore be performed as early as at germination stage after roots and/or small leaves have emerged from the seeds. This, greatly helps the oil palm industry in ensuring that only legitimate materials are planted in experimental plots and commercial estates. Nevertheless, implementing molecular markers for legitimate testing will incur additional cost to assess thousands of seeds generated routinely from hundreds of crosses made by oil palm breeders. It is therefore essential to determine the most efficient set of molecular markers needed for determining parental-offspring relationship in oil palm to balance the cost and benefits of legitimate testing. The use of optimum set of markers will also ensure that gain from testing will outweigh the cost involved. In an earlier study, Thongthawee et al. (2010) had suggested that up to 8 loci are sufficient to detect errors in pollination of controlled crosses in oil palm. Hama-Ali et al. (2015) opined that up to 16 loci are required for correct assignment of sibs to a particular family when parental genotypes were not available. However, both studies only utilized advanced breeding lines and did not evaluate germplasm materials which may have additional alleles present.

In the present study, samples from advanced breeding crosses and germplasm were utilized in parentage analyses so that the optimized set of markers can be determined in most existing oil palm crosses. Thus, the main objective of our study is to estimate the progeny-specific optimum sets of SSR markers required for identification of illegitimates in a wide range of oil palm crosses. Such information is important in the allocation of resources to implement DNA-based testing to remove spurious palms in controlled crosses.

Materials and Methods

Plant materials

In the present study we analyzed four bi-parental and one selfed cross derived from different parental palm, as listed in Table 1 with their genesis. KT, PUP, PRC and T×P are bi-parental crosses each produced from two different parental palms whereas T128 was created from selfing of a Nigerian germplasm palm. In the present study, the terms ‘cross’ and ‘family’ are interchangeably used which refer to set of palms belonging to a family derived from crossing of the selected palms. Young leaflets were harvested from the parental palms as well as individual palms of each cross. Total DNA was extracted based on a method of Doyle and Doyle (1990), and the quality tested by enzyme digestion using EcoR1 (6-base-cutter) and HaeIII (4-base-cutter) as described by Rahimah et al. (2006). The concentration and purity of the DNA were determined using a spectrophotometer. The optical densities (OD) at 260 nm and 280 nm absorbances were recorded. All parents were included into the analysis thus DNA from all parents as well as their derived crosses were genotyped with SSR markers.

Table 1. Summary of minimum number of markers needed to detect illegitimates
Family/Cross No. of palm Genetic background No. of Illegitimate* Minimal no. of markers to detect all illegitimate
KT 161 Dura × Pisifera 21 3
PRC 50 Dura × Pisifera 49 2
PUP 47 Dura × Pisifera 4 2
T×P 30 Tenera × Pisifera 18 4
T128 35 Tenera Selfed 6 2

* based on CERVUS at 95% confidence level.

SSR amplification

Sixty SSRs (Supplemental Table 1) comprising those developed by Billotte et al. (2005), reported by Ting et al. (2013) and available at the Malaysian Palm Oil Board (MPOB), were screened across a panel that contained representatives of each cross. These SSR markers have been used for construction of oil palm genetic linkage map and were evenly distributed across 16 linkage groups (Billotte et al. 2010). For screening, PCR amplification was performed using 2 μL of 50 ng/μL DNA samples in the following PCR mixture: 4.7 μL miliQ water, 1.0 μL 10× PCR buffer, 0.2 μL dNTPs, 1.0 μL of (100 μM) of forward and reverse primer and 0.1 μL of 5 U/μL Taq polymerase (New England Biolabs). The amplification was carried out using an Applied Biosystem thermocycler (GeneAmp PCR System 9700) as follows: denaturation at 95°C for 30 s, annealing at 50–57°C (depending on the primers), extension at 72°C for 30 s and final extension at 72°C for 5 minutes. For the preliminary screening of polymorphic SSR markers, PCR products are visualized in agarose gels to ensure appropriate sizes are attained. The PCR products were subjected to electrophoresis on 4% superfine resolution agarose gels at 120 V for 4 hours. Bands were visualized under UV light after ethidium bromide staining. SSR primers that do not generate expected size are excluded. Polymorphic SSRs identified from the gels were subsequently used for genotyping the five crosses.

For genotyping, 2 μl DNA samples at 50 ng/μl concentration were mixed with one of the 4-color fluorescent detection (HEX, FAM, PET or NED), 0.025 μL M13-forward and untailed reverse SSR primers, 0.1 μL of 5 U/μL Taq DNA polymerase, 0.2 μL of 10 mM dNTPs and 1 μL of 10× PCR buffer (New England BioLabs). PCR amplification was performed using the same PCR program described above, with a longer final extension time of 35 minutes. The PCR products were subjected to fragment analysis in Applied Biosystem capillary sequencer (3730xl DNA Analyzer). Data from the sequencer were analyzed using GeneMapper v4.1 (Applied Biosystems) software. Fragment sizes were estimated using GeneScanTM-500 LIZ® as reference (Applied Biosystems). The products were coded according to Billotte et al. (2005) (Supplemental Table 2). Amplified products that satisfy Mendelian segregation ratios were scored and applied for parentage analysis.

Parentage analysis

In the present study, we analyzed crosses that were derived from known parental palms. We applied CERVUS version 3.0 software (Kalinowski et al. 2007) that implements categorical allocation where off-springs were assigned to the candidate parent, either single or double, with the highest likelihood (LOD) of being the true parent(s). Genotypic data of parental palms and their progenies were used in allele frequency analysis which generated the number of alleles, polymorphism information content (PIC), expected heterozygosity (HExp), observed heterozygosity (HObs) (Nei 1987), frequency of null allele and non-exclusion probability for each of the markers. The information was used in simulation analysis to determine the significant threshold LOD values at 95% confidence level. Palms with LOD values below the threshold are considered illegitimates. In the simulation, we applied the following parameters: 1,000 progenies, 1% mistyped loci, 1% error rate (these are default values recommended by CERVUS software to reduce the exclusion resulting from the laboratory error) and 0.5 proportion sampled (only the actual parents were sampled and parent of the illegitimate is unknown). The minimum loci typed were set at half the total number of loci before LOD scores for each progeny were calculated. Relationship between the number of mismatching loci and the LOD score were examined for each cross. Once determined, we removed all illegitimates from the crosses and carried out allele frequency analysis again on the “legitimate family” using CERVUS software.

The values obtained from CERVUS software were further analyzed using an in-house Perl script that determined the number of illegitimates for all possible combination of informative markers. We applied two methods to determine the final number of illegitimates in the crosses. First, we selected the best marker combinations that identify all illegitimates. The number of combinations denoted as C(n, k), where n = number of total markers in the particular cross, k = number of markers to be selected as sample, k = {1, 2, ..., n}. Theoretically, when more markers are added, more illegitimates are discovered. However, at certain level, the number of illegitimates detected remains consistent (as all have been covered), even with increasing marker numbers. Thus, this allows us to determine the optimum sets of SSRs required to identify all illegitimates in the crosses as well as specific markers that detect all illegitimates in the respective crosses. Secondly, we averaged the number of illegitimates detected for all marker combination(s) for each k value. Here, we also estimated the standard deviation for each k value. The number of informative markers identified for the KT cross were relatively high, about two-times more than the other crosses. Due to the limited computing power of the server, we could only determine the average number of illegitimates for k = 1 to k = 10 and k = 40 for this cross. As such, curve fitting was performed by applying the “Moving Average” in MS Excel 2019 to demonstrate the trendline for k = 11 to 39.

Results

SSR segregation

DNA extracted from all samples showed high quality and were considered suitable for molecular marker genotyping (Supplemental Table 3). Of the 60 SSRs tested, 40 showed efficient amplification and appropriate polymorphism in KT cross. A total 20 polymorphic SSRs were obtained across the PUP, PRC, T×P and T128 crosses.

Parentage analysis

The results of allele frequency analysis for the five crosses (before removal of illegitimates) are presented in Supplemental Tables 4–8 and summarized in Table 2. The mean number of alleles ranged between 3.10 and 6.00, while the mean polymorphic information content (PIC) observed was between 0.403 and 0.653. Expected null allele frequency was high in three loci observed in PRC and T128 crosses. In PRC, two null alleles were detected in markers mEgCIR3769 (frequency = 0.1238) and sEg00161 (frequency = 0.0573). Three null alleles observed in T128 were for markers mEgCIR3362, sMo00131 and mEgCIR3557 at frequencies 0.1048, 0.0592 and 0.1670, respectively. These null alleles occur at frequency of more than 0.05 which are considered significant (Kalinowski et al. 2007), thus were removed from parentage analysis. Therefore, the remaining numbers of markers for parentage analysis in PRC and T128 crosses were 18 and 17 respectively.

Table 2. Results of allele frequency analysis across five different crosses, prior to removal of illegitimates
Allele frequency Populations
KT PRC PUP T×P T128
Mean number of alleles 4.53 3.61 3.25 6.00 3.10
Mean expected heterozygosity (HExp) 0.671 0.568 0.535 0.710 0.521
Mean polymorphic information content (PIC) 0.607 0.489 0.445 0.653 0.403
Combined probability of exclusion for first parent (CPE-P1) 0.99999 0.97088 0.96487 0.99953 0.91159
Combined probability of exclusion for second parent (CPE-P2) 1.00000 0.99869 0.99795 1.00000 0.98369
Combined probability of exclusion for parent pair (CPE-PP) 1.00000 0.99998 0.999958 1.00000 0.99873

Fig. 1 illustrates the distribution of LOD values at 95% confidence against number of mismatch loci for each cross. These figures revealed that the number of mismatch loci increased with decreasing LOD values. Critical LOD values for mother, father and parent pair for each cross are presented in Supplemental Table 9. Palms below the threshold signified significant mismatching loci compared to their purported maternal and paternal palms. For KT cross, twenty-one palms distributed below the threshold are considered illegitimates (Fig. 1A). The number of suspected illegitimates in other crosses was 49 for PRC (Fig. 1B), 4 for PUP (Fig. 1C), 18 for T×P (Fig. 1D), and 6 for T128 (Fig. 1E). Further examination revealed that the illegitimates in families PUP, T×P and PRC had more mismatching loci to their paternal than maternal palms.

Fig. 1.

The LOD score of progenies (y-axis) and number of mismatch loci (x-axis) for KT (A), PRC (B), PUP (C), T×P (D) and T128 (E) crosses compared to maternal, paternal and parent according to crosses respectively.

Further verification of the suspected illegitimates in KT cross was carried out by examining their electrophenogram profiles (Supplemental Fig. 1). The allelic patterns of suspects were compared with their assigned parent profiles. For example, if the parent profile for marker mEgCIR3727 is ab × cd, then the progenies are expected to show ac, ad, bc or bd profiles. However, progeny 1, for example, inherited only one of the male alleles and progeny 2 none, then they would have profiles deviant from the expected Mendelian ratios and are suspected illegitimates.

Fig. 2 shows the optimal number of markers required to detect all illegitimates (solid line) and the mean number of illegitimates detected for all possible marker combinations (dotted line) for particular k value (defined in Materials and Methods). Essentially, the optimal number of markers required to detect all illegitimates in the crosses analyzed in this study is between two to four. It is interesting to note from Fig. 2, that if the optimal marker set is not used, all the illegitimates in a particular cross can only on average be detected if the maximum number of polymorphic markers are utilized. The mean number of illegitimates detected (with standard deviation) for all possible marker combinations of a particular k value is presented in Supplemental Table 10. The best marker combinations for identification of illegitimates in each cross is summarized in Table 4. One of these, namely mEgCIR3727 could detect illegitimates in three crosses while, sMo00121 and mEgCIR3649 successfully identified suspects in two of the crosses.

Fig. 2.

The number of illegitimates detected with increasing number of markers applied; The solid line represent illegitimates detected by best (optimal) marker combination and dotted line represent mean number of illegitimates detected for all possible combination of markers for a particular k value; A: KT, B: PRC, C: PUP, D: T×P, E :T128. In Fig. 2A, as a large number of markers were involved, the curve fitting (red line) was performed (see Materials and Methods).

Table 3. Results of allele frequency analysis across five different crosses (after removal of illegitimates)
Allele frequency Populations
KT PRC PUP T×P T128
Number of samples 140 1 43 12 29
Mean number of alleles 3.40 2.67 2.65 3.10 2.00
Mean expected heterozygosity (HExp) 0.658 0.644 0.533 0.639 0.499
Mean polymorphic information content (PIC) 0.591 0.456 0.450 0.544 0.370
Combined probability of exclusion for first parent (CPE-P1) 0.99997 0.952546 0.99997 0.989654 0.887401
Combined probability of exclusion for second parent (CPE-P2) 1.00000 1.00000 1.00000 1.00000 0.97000
Combined probability of exclusion for parent pair (CPE-PP) 1.00000 1.00000 1.00000 1.00000 1.00000
Table 4. Best marker combination for illegitimate detection in each cross evaluated in the present study
Populations Markers
KT mEgCIR3727 mEgCIR3622 mEgCIR3311
PRC mEgCIR3649 mEgCIR3310 sMg00087 sMg00156
PUP mEgCIR3727 sMo00121
T×P mEgCIR3727 sMo00121 mEgCIR3362 mEgCIR2433
T128 mEgCIR2332 mEgCIR3649

The summary of results of allele frequency analysis after illegitimate exclusion is shown in Table 3. In general, the mean number of alleles and PIC are lower than those presented in Table 2, for all crosses. Unspecific alleles possessed by the illegitimates may have contributed to higher estimates of allele frequency parameters which can influence ranking of the best parental pair combination, especially in breeding programs (Ooi et al. 2019). Combine exclusion probability (CPE-PP) for the best marker combination of each crosses ranged from 0.61987–0.99261, as presented in Table 5.

Table 5. Combined exclusion probability result for best marker combination in each cross
KT PRC PUP T×P T128
Combined probability of exclusion for first parent (CPE-P1) 0.60893 0.69029 0.40800 0.82404 0.28014
Combined probability of exclusion for second parent (CPE-P2) 0.81949 0.88496 0.62251 0.94622 0.44586
Combined probability of exclusion for parent pair (CPE-PP) 0.93958 0.970668 0.79302 0.99261 0.61987

Discussion

The initial allele frequency analysis revealed that the polymorphic markers recorded PIC >0.3 across all crosses confirming usefulness in parentage analysis and detection of potential illegitimates (Mateescu et al. 2005). The combined probability of exclusion recorded for parent pair is 1.0 for all crosses indicating the high discrimination power of the applied set of markers for parentage analysis. In addition, these markers also generated specific amplified products and alleles that are easy to score. The algorithms in CERVUS software accommodated a defined scoring error rate to improve the success rate of paternal assignment (Marshall et al. 1998). In an assumed perfect dataset (zero error), any mismatch loci denotes paternal exclusion. However, if errors are present in the dataset, simulation analysis in CERVUS software will identify a threshold value to determine true parental assignment. In studying the effect of ignoring error in dataset, Marshall et al. (1998) found that paternal assignment at 80% confidence level (in zero error dataset) actually had a true confidence of 74%, which, we took into consideration in the analysis. Our results in Fig. 1 indicated that the likelihood of being a true off-spring (LOD) decreases as more mismatching loci are observed which, is in agreement with the expectation. These results also supported the accuracy of genotyping and parentage analyses implemented in the present study. Furthermore, as all illegitimates in this study are detected by at least two SSR markers, occurrence of false positives (in-correctly identifying an illegitimate) is therefore minimized. However, the possibility of false negatives (not detecting presence of an illegitimate) should not be discounted, although the use of highly informative and polymorphic markers, as done in this study can help reduce this phenomenon as well.

In identifying the optimal marker sets that can detect all illegitimates, it is important to avoid selectively increasing combination of markers at random, as detection power can decrease with increasing number of markers (data not shown). This will occur if highly polymorphic markers are selected early by chance and selection of a higher number of less polymorphic markers later will decrease the ability to detect all illegitimates. In the present study, the optimum sets of markers needed to identify all possible illegitimates in the crosses differed. We found KT, PRC and T×P crosses required three or more optimum sets of markers, while PUP and T128 required only two. The KT, PRC and T×P crosses recorded relatively higher diversity, in terms HExp and PIC, compared to other crosses. We revealed that a larger number of optimum sets of markers are needed to detect all illegitimates for crosses with higher level of genetic diversity. The low diversity observed for the germplasm cross T128 compared to others, was likely due to selfing that was used to generate this family, which is also reflected by its low HExp. Two markers were sufficient to identify all illegitimates in this cross, similar to that observed for the PUP cross. Further, we shortlisted specific markers capable of identifying all illegitimates in the crosses analyzed in the present research. Specific markers such as mEgCIR3727, sMo00121 and mEgCIR3649 could identify illegitimates in more than one cross. These markers recorded high PIC and therefore should be informative in detecting most of alleles present in the crosses, including those exhibited by the illegitimates. The eleven unique and specific markers that are listed as detecting illegitimates in the multiple crosses utilized in the study could be the core marker sets that are potentially useful for detecting illegitimates in other crosses not included in the present study. However, there is always a possibility that additional markers could be required to reveal the full complement of illegitimates in other independent crosses, especially involving different genetic backgrounds. Nevertheless, the highly informative set of markers described here can still form the principal set that can be expanded if necessary, to ensure the purity of controlled crosses in oil palm. This study further revealed that selecting markers at random to detect illegitimates is not effective. Establishing a highly informative core set is the most efficient and cost-effective for establishing purity of crosses.

Establishing the optimum set of SSR markers for general fingerprinting is routinely carried out for other crops such as rice (Sundaram et al. 2008), cotton (Selvakumar et al. 2010), safflower (Naresh et al. 2009), olive (Rosa et al. 2004), peach (Rojas et al. 2008), Populus (Rajora and Rahman 2001), Picea (Narendrula and Nkongolo 2012), tea (Tan et al. 2019), blackberry (Zurn et al. 2018), coconut (Azevedo et al. 2018), shrub (Hao et al. 2020), Piper (Christ et al. 2018) and Pinus (Elliott et al. 2005) where a set of 6 to 18 SSR markers were suitable to enhance breeding and selection programs.

We observed profile mismatch loci between the progenies and both, maternal and paternal palms. The mismatch loci between progenies and paternal parent could have been due to contaminated pollen from, for example, torn pollination bags or human error, such as too loose tying of the bags to allow insects in with unspecific pollen (Chin 1993, Corley 2005, Donough et al. 1992). This signifies that periodic systematic quality checks should be carried out to detect faulty pollination. On the other hand, nursery practices in planting the wrong seeds and mix-up occurring during transfer from small to bigger polybags, may contribute to mismatch loci between progenies and maternal trees.

In oil palm breeding, crosses are continuously made with the inevitable specter of mislabelling and mix-ups. Based on results presented here, the oil palm breeders can now effectively apply the core set of molecular markers as a quality control tool for legitimate testing. The polymorphic nature of SSR markers also likely makes them more cost-efficient in ensuring legitimacy in controlled crosses. As a comparison, a recent study (Teh et al. 2019) indicated that up to 80 SNPs are likely required to effectively detect dura contamination in D×P progeny. The progeny-specific core set of SSR markers identified in the present study provides a cost-efficient tool which can be readily embraced by the industry to ensure purity of a wide range of controlled crosses and better manage oil palm improvement program.

Author Contribution Statement

Siti Hazirah Z. carried out the experiment, analysis and data interpretation. Siti Hazirah Z and Maizura I. drafted the manuscript, Chan K.L helped in data analysis, Mohd Isa Z.A. supplied the genetic materials and provided critical feedback on the results, Ismanizan I. supervised the project, Ting N.C and Ooi L.C.L supplied DNA materials for the study, Rajinder S. conceived the idea and was involved in overall direction and planning of the research besides revising the manuscript for intellectual content, Maizura I., as corresponding author, encouraged the first author to investigate and supervised the work.

Acknowledgments

The authors thank the Director-General of MPOB for permission to publish this article. Authors also would like to extend their appreciation to Mr. Chang Kwong Choong for carefully editing the manuscript.

Literature Cited
 
© 2021 by JAPANESE SOCIETY OF BREEDING
feedback
Top