Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Short communication
Development and application of a sex-linked marker for Herpetospermum pedunculosum based on whole-genome resequencing
An-Ning LiZhi-Li ZhouXi-Long WangXue-Mei WenYan-Li TuLi-Hua Meng
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2025 Volume 100 Article ID: 24-00182

Details
ABSTRACT

Sex-specific DNA markers are effective tools for sex identification and sex-controlled breeding of dioecious organisms. The seeds of the dioecious Herpetospermum pedunculosum are utilized in traditional Chinese medicine, and the development of sex-linked markers for seedlings is crucial for enhancing the number of female plants. In this study, we screened sex-specific markers based on whole-genome resequencing of 20 male and 24 female H. pedunculosum individuals, and validated a male-specific DNA fragment of 505 bp among 80 individuals from four populations using simple PCR. The findings provide a reliable male-specific marker for the sex identification of H. pedunculosum seedlings.

MAIN

Sex is a fundamental trait in both animals and plants. Most flowering plants (~90%) are hermaphroditic, with male and female functions in one flower. Dioecy, an extreme form of sexual segregation, accounts for about 6% of flowering plants (Heikrujam et al., 2015). For economically important dioecious plants (e.g., kiwifruit and pistachio), increasing the proportion of female plants potentially enhances fruit production. However, it is generally difficult to determine the sex of dioecious plant species prior to flowering, underscoring the importance of sex identification for seedlings of dioecious plants.

Sex-specific markers are valuable tools for identifying the sex of dioecious organisms (Zheng et al., 2024), and recent advancements in high-throughput sequencing technology facilitate the large-scale discovery of genome-wide molecular markers (Xu et al., 2024). Indels are widely distributed across the genome in high density and abundance (Yang et al., 2016), making them suitable as molecular markers (Gao et al., 2012). For example, indel markers have been utilized to assess genetic diversity in Chenopodium quinoa (Zhang et al., 2017), as well as for variety identification in Glycine max (Chen et al., 2021) and Momordica charantia (Cui et al., 2021). Additionally, indel markers are particularly effective for sex identification in fish (Du et al., 2023; Huang et al., 2024b; Zheng et al., 2024).

Herpetospermum pedunculosum (Cucurbitaceae) is a traditional Chinese medicinal plant, with distribution in southwest China, Nepal and northeast India, and its seeds are used to treat liver disease, cholecystitis and dyspepsia (Wei et al., 2020). In dioecious Cucurbitaceae species, both heteromorphic and homomorphic sex chromosomes have been reported, and an XY sex determination system has been documented (Ming et al., 2011) in, e.g., Trichosanthes pilosa (Zhao et al., 2024) and Coccinia grandis (Janousek et al., 2022). Our unpublished genome data also strongly suggested an XY sex determination system in H. pedunculosum. Herpetospermum pedunculosum has been widely cultivated in Xizang, but wild populations of this species are strongly male-biased (ca. 70%) (Lan et al., 2011; Chen et al., 2020; Wu et al., 2024). Therefore, increasing the proportion of female plants could enhance fruit production, highlighting the importance of identifying the sex of H. pedunculosum seedlings.

We collected leaves of 20 male and 24 female H. pedunculosum individuals from a wild population in Shangri-La (27°54′36′′ N, 99°38′24′′ E, 3250 m), northwest Yunnan province. DNA of each sample was extracted for genome resequencing using the CTAB method. The quality and quantity of DNA were assessed by agarose gel (0.8%) electrophoresis, Nanodrop 2000 spectrophotometer and Qubit fluorescence spectrophotometer. The genomic DNA of each individual was randomly sheared into ~350-bp fragments with a Covaris crusher. Library preparation was performed with a Vazyme DNA library preparation kit (NDM607-01), involving end-repair, 3′ adenylation, adapter ligation, purification and PCR amplification. Subsequently, paired-end sequencing was performed on the DNBSEQ-T7 platform (BGI, China).

fastp 0.12.4 (Chen et al., 2018) was used to control the quality of the raw data, and clean data were obtained with default parameters. The reads from 20 male and 24 female plants were then aligned with the reference genome using BWA 0.7.17-r1188 (Li and Durbin, 2009) with default parameters, and duplicate reads were removed using Picard MarkDuplicates (http://broadinstitute.github.io/picard). SNP and indel variation detection were performed using GATK 4.2.2.0 (McKenna et al., 2010).

A total of 1,995,047,592 clean reads from male plants and 1,992,527,572 clean reads from female plants were generated. For male and female plants, the average alignments were 98.09% and 99.22%, and the average coverage rates were 96.03% and 93.93%, respectively. The average sequencing depths were 17.85 for male plants and 15.04 for female plants. Nucleotide statistics revealed that the average Q20 values for clean data were 98.03% and 97.70%, while Q30 values were 94.18% and 92.94%, for male and female plants, respectively. The GC content was 38.50% for males and 38.19% for females.

To examine the genetic relationship of male and female plants, we filtered SNPs and obtained 5,204,545 high-confidence SNPs. RAxML software (Stamatakis, 2006) of the maximum likelihood algorithm was then employed to construct phylogenetic tree of 44 individuals, and admixture software (Liu et al., 2013) was used to analyze genetic structure based on a Bayesian mathematical model. The results suggested that some individuals were closely related (Supplementary Fig. S1), and the 44 individuals could be divided into two groups, independent of sex (Supplementary Fig. S2). Therefore, the effectiveness of markers should be validated experimentally in more populations.

The indel types were annotated in Annovar software (Wang et al., 2010). The number of indels ranged from 352,581 to 1,163,931, with an average of 666,654. There were 435,721 indels in intergenic regions and 170 indels in splicing regions (Fig. 1A). The heterozygosity of indels was 55.56% and 66.28% in female and male plants, respectively (Fig. 1B), indicating a possible XY sex determination system in H. pedunculosum. The number of indels was high on chromosome 2 (102,930 in females and 98,330 in males) but was low on chromosome 8 (47,123 females and 40,778 in males) (Fig. 1C). Short indels of no more than 5 bp were dominant (75.0% in females and 75.2% in males), while long indels exceeding 50 bp were rare (1.38% in females and 1.37% in males) (Fig. 1D). All data related to Figure 1 can be found in Supplementary Tables S1–S4.

Fig. 1. General statistics of indels in resequenced male and female plants of Herpetospermum pedunculosum. (A) Indel type distribution. (B) Number of heterozygous and homozygous indels in female and male plants. (C) Number of indels on different chromosomes in female and male plants. (D) Number of indels with different lengths in female and male plants. Error bars indicate mean ± SD.

Genome-wide associations suggested that sites with –log10(P) larger than 8 could be associated with sex (Wang et al., 2022), and we screened 954 indels located on chromosome 5 and unanchored scaffolds (Supplementary Fig. S3), which could identify the sex of at least 80% of all plants. For validation of sex-specific markers, leaves of 41 male and 39 female plants were collected in four wild populations (Table 1). Genomic DNA of each sample was extracted using the CTAB method. Primers with a length of 20 to 30 bp for indels were designed with Oligo software (Rychlik, 2007), targeting a product length of 200 to 1,000 bp. We performed PCR amplification using the following protocol: 25 μl total reaction volume containing 12.5 μl of 2× SanTaq PCR Master Mix (Sangon Biotech, Shanghai, China), 10 μM each primer, 9.5 μl of deionized water and 1 μl of genomic DNA. The PCR amplifications were as follows: pre-denaturation at 95 °C for 5 min; 35 cycles of denaturation at 95 °C for 30 s, annealing at 52–59 °C for 30 s, extension at 72 °C for 30–60 s; and a final elongation step at 72 °C for 7 min. The PCR products were then analyzed using 1–2% agarose gel electrophoresis.

Table 1. Sampling sites of four wild population of dioecious Herpetospermum pedunculosum

PopulationLocationMaleFemale
Shangri-La, Yunnan27°54′36″ N, 99°38′24″ E, 3,250 m1212
Wengshang, Yunnan28°12′57″ N, 99°44′22″ E, 2,973 m1212
Geza, Yunnan28°11′19″ N, 99°46′48″ E, 3,173 m1210
Chentang, Xizang27°51′38″ N, 87°25′13″ E, 2,500 m55
Total4139

A total of 303 pairs of primers were validated, and only one pair from unanchored scaffolds successfully identified the sex of all plants across the four populations (Fig. 2). The 505-bp fragment was successfully amplified in 41 male plants but not in 39 female plants. The forward primer sequence was TAGAGGAGTGAGAAAGAGGCCGTG, and the reverse primer sequence was CGCTTCTACGCGCAATCGGTTCA. The PCR product was sequenced, and the homologous sequences in the female genome and the male genome were compared using DNAMAN software (Lynnon Biosoft, San Ramon, CA, USA). There were three mismatched bases in the upstream primer, and 15 in the downstream primer, in the female genome (Fig. 3), which probably contributed to the failed PCR amplification in the female plants.

Fig. 2. Validation of the male-specific marker in all individuals from the four populations of Herpetospermum pedunculosum. M: DL 5000 DNA marker, with bottom-up bands of 100, 250, 500, 750, 1,000, 1,500, 2,000, 3,000 and 5,000 bp. (A) PCR amplification results in Shangri-La population (n = 24). (B) PCR amplification results in Wengshang population (n = 24). (C) PCR amplification results in Geza population (n = 22). (D) PCR amplification results in Chentang population (n = 10).

Fig. 3. Sequence alignment results of the male-specific sequence and its homologous sequences in the male and female genomes of Herpetospermum pedunculosum. Black boxes mark the positions of upstream and downstream primers, and cyan shading indicates bases with 100% identity among the three aligned sequences.

This validated sequence was homologous to the ethylene-responsive transcription factor 12-like gene (LOC111025114) in Momordica charantia (Supplementary Table S5). Ethylene is a key feminizing hormone in cucurbits (Switzenberg et al., 2014; Martínez and Jamilena, 2021), and the ethylene insensitive 2 gene (EIN2) in Cucumis melo inhibits the expression of the carpel inhibitor CmWIP1, promoting the development of female flowers (Huang et al., 2024a). The validated male-specific sequence was annotated to gene M-21960 (with a full length of 84,176 bp) in the male genome, and to gene F-13860 (with a full length of 620 bp) in the female genome of H. pedunculosum, and a considerable number of repeats in the M-21960 gene likely contributed to the difference in length. Between the two genes, the overall nucleotide alignment identity was 0.42% (Supplementary Fig. S4), and the overall amino acid alignment identity was 0.24% (Supplementary Fig. S5). Therefore, M-21960 and F-13860 may function differently in regulating ethylene synthesis between male and female plants of H. pedunculosum.

In conclusion, we validated a male-specific DNA marker for sex identification of H. pedunculosum. Molecular markers based on differences in sequences are more reliable than morphological, physiological and biochemical traits (Heikrujam et al., 2015). This validated marker can accurately identify the sex of H. pedunculosum seedlings, which should be of great value in increasing the proportion of female plants in the cultivation of this annual medicinal herb.

DECLARATIONS

Funding: This research was financially supported by the National Natural Science Foundation of China (32160261), and the Science and Technology Program of Xizang Autonomous Region (XZ202401JD0030).

Authors’ contributions: L.-H. M. designed the study and reviewed the draft of the manuscript, A.-N. L. conducted the experiments and prepared the manuscript, Z.-L. Z. analyzed the resequencing data, and X.-L. W., X.-M. W. and Y.-L. T. collected materials from the field.

Conflicts of interest: The authors declare no conflict of interest.

ACKNOWLEDGMENTS

We are grateful to all staff in the Shangri-La Alpine Botanical Garden for logistical support in the field.

REFERENCES
 
© 2025 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top