Edited by Norihiro Okada. Zhijian Cao: Corresponding author. E-mail: zjcao@whu.edu.cn. Wenxin Li: Corresponding author. E-mail: liwxlab@whu.edu.cn |
The order Scorpiones, with sixteen living families and about 1500 described species and subspecies, is a rather small group of chelicerate arthropods (Prendini and Wheeler, 2005; Soleglad and Fet, 2003). Nevertheless, scorpions have received considerable attention through the years because of their medical importance, their antiquity and obvious importance to the analysis of chelicerate phylogeny, and their local abundance and wide geographical distribution (David, 1990). Moreover, the amazing suite of biochemical, behavioral and ecological adaptations endues their continued success over the past 450 million years. Though these “living but sophisticated fossils” have changed little in their morphology since their first appearance, their adaptations range from superbly efficient behavioral repertoires to the maternal care of offspring, ultrasensitive tactile and visual fields, and complex venoms that are a precise mixture of different toxins, each with its own action (Goudet et al., 2002; Prendini et al., 2006). However, many taxa are poorly understood and the relationship of scorpions to other chelicerates remains a topic of controversy. In particular, the phylogenetic position of the Scorpiones, a key order in the arachnid phylogeny, is highly disputed (Jones et al., 2007; Mallatt and Giribet, 2006; Murienne et al., 2008).
Genome studies of scorpions will allow us to understand the phylogeny of the order Scorpiones and to dive into the secrets of the “living fossils”. BAC library is a powerful tool for studying genomes. Since it was developed by Shizuya et al. (1992), bacterial artificial chromosome system has become the preferred approach for constructing genomic library and been used to clone gene, map genome, and sequence BAC DNA, because of loading capacity of large fragments, high stability and convenient manipulation (Osoegawa et al., 2001). Bacterial artificial chromosome had already been used to construct genomic libraries from many animals to plants. However, no such large-insert genomic DNA library has been reported to date in the order Scorpiones. Chinese scorpion M. martensii Karsch is widely distributed in China, Korea and Mongolia. It is also one of the important raw material used in traditional medicines for more than 1000 years, especially for treating neural diseases such as apoplexy, hemiplegia and facial paralysis (Shi and Zhang, 2005; Zhijian et al., 2006).
In this paper, we determined the genome size of the scorpion M. martensii Karsch, constructed its BAC library, screened and obtained the complete sequence of the ribosomal RNA gene unit of M. martensii Karsch that was previously unknown in the order Scorpiones. Taken together, these results will provide important genetic resources and tools for comparative genomics and phylogenetic analysis.
Adult specimens of the scorpion M. martensii Karsch were purchased from the Suizhou City in Central China’s Hubei Province. The male reproductive organs and the female ovaries were grinded by a pair of tweezers in ice-cold 1 × PBS, respectively. The mixture was filtered through 40- to 100-μm nylon mesh, and then washed with ice-cold 1 × PBS twice by centrifugation (about 250 g for 5 min) at 4°C. The cell density was adjusted approximate 1 × 106 cells/ml using a haemacytometer. DNA content measurement was performed according to previously described methods (Vindelov et al., 1983; Wei et al., 2003). The PI fluorescence intensity measurement was performed with Phoenix Flow Systems (Beckman and Coulter). At least 10,000 cells were measured per individual sample. Nuclear DNA content of samples was estimated using the DNA content of chicken blood cells (2.5 pg/nucleus) as the control (Dolezel et al., 2003).
The scorpion BAC library was constructed as described before with minor modifications (Osoegawa et al., 1998; Zhang, 2000). Briefly, genomic DNA was isolated from the cells of the testes, subjected to partial digestion with HindIII, size selected on agarose gels by pulsed-field gel electrophoresis, and ligated to the CopyControl pCC1BAC HindIII cloning-ready vector (Epicentre Biotechnologies, Madison, WI, USA) as described (http://www.epibio.com/pdftechlit/177pl074.pdf). Ligation products were transformed by electroporation into the TransforMax Escherichia coli EPI300 strain (Epicentre Biotechnologies). White colonies were picked and transferred into 384-well plates for growth, storage, and replication. High-density filters representing the whole library were prepared on 30 pieces of nylon membranes, with each membrane harboring the BAC clones corresponding to four 384-well plates and two replications per BAC clone. Estimation of the average insert size and a stability test of the BAC library were conducted as described by Shizuya et al. (1992).
The R1, R2 and R3 probes were designed according to the complete sequence of Pandinus imperator (Scorpiones; Scorpionidae) 5.8S ribosomal RNA gene (GenBank accession no, AY210830), the partial sequence of P. imperator 28S ribosomal RNA gene (GenBank accession no, AY210830), and the partial sequence of M. martensii Karsch 18S rRNA gene (GenBank accession no, AB008465), respectively. Primers used for R1 are 5’-GGCTGTACTCCCAAACAACCCGACT-3’ and 5’-CCTGTCTGAGGGTCGGACGAATAAC-3’. Primers used for R2 are 5’-CGCGAGACCCGACACTACCGT-3’ and 5’-ACCGCGAAAGCGGGGCCTAT-3’. Primers used for R3 are 5’-GGCAGTCCGGGAAACAAAGT-3’ and 5’-CCTACGGAAACCTTGTTACGACTT-3’. Probes were labeled with Biotin-11-dUTP by random prime labeling method as following the procedure described by North2South Biotin Random Prime Labeling Kit (Pierce). The hybridization procedures were performed following methods described by a North2South Chemiluminescent Hybridization and Detection Kit (Pierce).
The positive BAC DNAs were purified using the BAC96 Miniprep Kit (Millipore, USA). The primers for BAC end-sequencing were designed according to the flanking region of the HindIII cloning site of pCC1BAC Vector. The primers for each end of the positive BAC clones were designed to assemble rRNA gene contig (data not shown). 52F9 BAC clone contained M. martensii Karsch rRNA gene was sequenced by Beijing Genomics Institute (BGI), China. The analysis of M. martensii Karsch rDNA sequence was performed with Gene Runner, and GenBank NCBI database (http://www.ncbi.nlm.nih.gov/blast). The complete ribosomal DNA sequence of M. martensii Karsch reported here has been deposited in GenBank under the accession number FJ948787.
The 18S rDNA sequences used for alignment and phylogeny analysis were obtained through NCBI ENTREZ (http://www.ncbi.nlm.nih.gov/entrez/) (Table 1). The alignment was performed by Clustal_X 1.83 software followed by manual adjustment (Thompson et al., 1997), and viewed by the software Jalview (Waterhouse et al., 2009). Phylogeny analysis was carried out with Neighbor joining (NJ) and Maximum Parsimony (MP) methods implemented in MEGA3.1 (Kumar et al., 2004).
![]() View Details | Table 1 Data on species for phylogenetic analysis |
Flow cytometrical analysis of the gonadal cell suspensions without chicken erythrocyte reference cells gave fluorescence histograms showing three peaks (corresponding to C, 2C and 4C DNA content nuclei) in mature testes, or two main peaks (2C and 4C) in immature or spent gonads. Simultaneously, flow cytometrical analysis of the ovarian cell suspensions gave fluorescence histograms showing the main peak (2C) and the weak peak (4C). The fluorescence peak position of diploid (2C) cells for the scorpion M. martensii Karsch was therefore determined. Subsequently, the corresponding sample with reference cells was run and the mean values of the diploid peak of the scorpion and the chicken (internal standard) were recorded (Fig. 1). Estimated values of average diploid (2C) DNA content are summarized in Table 2. As shown in Fig. 1 and Table 2, the mean DNA content of scorpion was 1.19 pg/nucleus. Moreover, t-tests of the average values showed that there was no a significant difference between the male and female cells of DNA content (P > 0.05). Therefore, the genome size of the scorpion M. martensii Karsch was estimated to be approximate 600 Mbp. Although the chromosome numbers of several species in the order Scorpiones are known (Moustafa et al., 2005), the genome size of any scorpion species has not been reported previously. Compared with that of the other arthropods, the C-value 1.19 pg of the scorpion M. martensii Karsch is very small, e. g., C-values of the spiders set range from 0.74 pg to 5.7 pg (Gregory, 2005; Gregory and Shorthouse, 2003).
![]() View Details | Fig. 1 Comparison of the DNA contents of the male scorpion cells and the chicken erythrocyte cells. |
![]() View Details | Table 2 Flow cytometry analysis of M. martensii Karsch against a chicken erythrocyte standard (2C = 2.5 pg) |
The BAC library of the scorpion M. martensii Karsch has 46,080 clones assembled into 120 384-well plates. To determine the average insert size of the library, DNAs from 200 random BAC clones were isolated and digested with NotI, and then fractionated by PFGE (Fig. 2A). The statistic results showed that the average insert size of the BAC library was estimated to be 100 kb with clones ranging from 80 to 120 kb (92.5%), and the percentage of empty recombinant clones was 3.0% (Fig. 2B). This BAC library represented about 7.7-fold genome coverage, calculated from the estimated insert size and the genome size of the scorpion M. martensii Karsch. The library of the scorpion M. martensii Karsch is the first BAC library in the order Scorpiones, providing useful resources for further genomic and genetic studies.
![]() View Details | Fig. 2 Analysis of the insert size from the scorpion M. martensii Karsch BAC library. A: Analysis of the insert size of 30 random BAC clones from the scorpion M. martensii Karsch BAC library by pulsed-field gel electrophoresis (PFGE). The DNA inserts were released by digesting with NotI enzyme and separated by PFGE. The 8.1 kb common band is pCC1BAC (HindIII) cloning vector. Molecular weight size marker is PFGE lambda DNA ladder (M). B: Insert size distribution of 200 clones randomly selected from the scorpion BAC library. The average insert size of the library was estimated to be 100.45 kb. |
To further characterize the quality of the library and facilitate its application, we screened this BAC library using 18S-5.8S-28S rDNA as probe. The probe pool was made by mixing 5.8S, 28S and 18S partial rDNA sequences of M. martensii Karsch. High-density nylon membranes of the whole BAC library were hybridized using the probe pool and 426 positive BAC clones were obtained. These positive BACs were picked from the BAC library and transferred onto nylon membrane filter with low density. Low-density nylon filter was hybridized with each individual probe to get the positive BACs corresponding to each probe. Almost positive BAC clones were overlapped with each other as revealed by restriction enzyme analysis with HindIII (not shown) and NotI (Fig. 3), PCR analysis, and DNA sequencing. Especially, the results by BAC end-sequencing and PCR analyses showed that the six positive BAC clones (605, 7F12, 7H11, 26F11, 52M15 and 52F9) cover a genomic region of ~120 kb and contain rDNAs (Fig. 4). Screening the library with the 18S-5.8S-28S rDNA identified a total of 426 positive clones, representing 0.92% of the library clones.
![]() View Details | Fig. 3 Six individual positive BAC clones were digested with NotI enzyme. The result indicates that all BAC clones were (excluding 26G17) overlapped with each other. |
![]() View Details | Fig. 4 Map of the BAC contigs covering the region of the scorpion 18S-5.8S-28S rRNA gene. The contig was constructed based on analysis of HindIII and NotI digestion, BAC end-sequencing, PCR identification, and finally complete DNA-sequencing of 52F9 positive BAC clone. The locations of the 18S, 5.8S and 28S rRNA coding sequences are indicated below the map of BAC clone 52F9. The arrow shows the transcription initiation site (TIS). |
A genomic BAC clone, 52F9, was isolated by screening the HindIII BAC library used 18S, 5.8S, and 28S rRNA coding sequences as probes, respectively. Sequence analysis of this clone showed that 52F9 only contains one of the rDNA repeat unit because of rDNA repeat unit near to the terminus of this clone. This is the first report of the complete sequence of the rDNA unit in the order Scorpiones. The rRNA gene of M. martensii Karsch is fragmented into ETS, 18S, ITS1, 5.8S, ITS2, and 28S, which shares the same gene organization and structure with the other animals. Sequence analysis of 52F9 clone revealed an 8779 bp rDNA unit of M. martensii Karsch, (1813 bp 18S rDNA, 157 bp 5.8S rDNA, 3823 bp 28S rDNA, 530 bp ETS, 2168 bp ITS1 and 288 bp ITS2) and a GC content of 54.53% in the coding sequences and 54.62% in the complete rDNA unit (Table 3). The sequence of M. martensii Karsch rRNA gene shares high homology with the previously reported rRNA genes from the other species of Scorpiones (Soleglad and Fet, 2003). Unfortunately, the complete sequence of rRNA gene has never been reported in the order Scorpiones.
![]() View Details | Table 3 The lengths and GC contents of ETS and ITS regions, 18S, 5.8S, and 28S rRNA genes |
The transcription initiation site (TIS) was predicted using Neural Network Promoter Prediction software (http://www.fruitfly.org/seq_tools/promoter.html) and WWW Promoter Scan software (http://www-bimas.cit.nih.gov/molbio/proscan/). The predicted result showed that the TIS (+ 1) corresponds to a base C at the 530 nt upstream of the 18S rRNA gene (Fig. 5 and Fig. 6). The rRNA gene promoter has been characterized in several organisms, ranging from protozoa to vertebrates. There is little sequence similarity between the ribosomal promoters from different species, and only closely related organisms show some degree of sequence conservation (Mallatt et al., 2004). Interestingly, 308 bp and 551 bp tandem repeat elements are present in IGS of M. martensii Karsch rRNA gene (Fig. 5). A 308 bp beginning at -2168 nt and ending at -1243 nt upstream of TIS repeats three times, while a 551 bp from -1173 nt to -71 nt upstream of TIS repeats twice. The rRNA genes of yeasts and multicellular eukaryotes are organized in tandem arrays where the genes are separated by intergenic spacers composed of repetitive sequence elements, such as ~5 copies of a 63 bp repeat in Leishmania major, ~8 copies of a 172 bp repeat in Trypanosoma cruzi (Martinez-Calvillo et al., 2001), ~8 copies of 35 bp, ~5 copies of 100 bp and ~20 copies of 60/81 bp repeats in Xenopus (Robinett et al., 1997), ~6 copies of 6–500 bp repeats in Drosophila (Stage and Eickbush, 2007). Although the sequences of the intergenic spacers are not conserved among different eukaryotes, the presence and analogous arrangement of repetitive enhancer elements, promoter duplications, and terminator sequences in Xenopus, Drosophila, and mice suggest that there is a functional conservation of intergenic spacer elements across distantly related species (Robinett et al., 1997; Stage and Eickbush, 2007).
![]() View Details | Fig. 5 DNA sequence of the intergenic region of the M. martensii Karsch rRNA unit. The TIS (+1) is underlined and denoted by an arrow. The locations of the 308 bp and 551 bp IGSRE are shown, indicating the same positions by an asterisk. |
![]() View Details | Fig. 6 DNA sequence of ITS1, 5.8S, and ITS2 regions of the M. martensii Karsch rRNA unit. The locations of the 242 bp, 61 bp repeats in ITS1 and the 61 bp repeat in ITS2 are shown, indicating the same positions by an asterisk. It is also indicated 5.8S rRNA gene, 18S rRNA 3’-ending and 28S rRNA 5’-ending. |
Interestingly, ITS1 has four 242 bp repeats and two 61 bp repeats. Surprisingly, the sequence like 61 bp tandem repeat element in ITS1 is also present in ITS2 (Fig. 6). The internal transcribed spacer of rDNA repeat unit is one of the most commonly applied phylogenetic markers (Keller et al., 2009; Schultz et al., 2006). However, the comparison between the other scorpions and arthropods showed that the ITS1 and ITS2 sequences are highly variable. Comparative sequence analysis of rRNA unit from different organisms has shown that sequence similarities can be found in related groups and the sequence of ITSs region can only be used for phylogenetic reconstructions at a low taxonomic level. The phylogenetic trees resulting from NJ and MP analyses based on 18S rDNA sequences, with Calocheiridius cf. termitophilus (Arachnida: Pseudoscorpiones, GenBank accession no. AY859559) as the outgroup species, are presented in Fig. 7. The 18S rDNA analysis included ten taxa from seven families: Buthidae, Caraboctonidae, Chaerilidae, Euscorpiidae, Pseudochactidae, Scorpionidae, Vaejovidae. Of total 885 characters, 714 characters were constant, 155 variable characters were parsimony-uninformative, and 49 characters were parsimony-informative. The topology of the tree by NJ method was similar to that by MP method. Tree topologies showed that parvorder Iurida was clearly a sister group to the other three parvorders (clade Buthida + Chaerilida + Pseudochactida), a result was disagreed with previously described (Prendini and Wheeler, 2005; Soleglad and Fet, 2003). So resolution of this outstanding question would require sequence information from more species and more sophisticated methods of phylogeny reconstruction.
![]() View Details | Fig. 7 Cladograms (A; NJ tree, B; MP tree) showing phylogeny of ten taxa based on 18S rDNA sequences. Numbers represent bootstrap percentages. The topologies were tested using bootstrap analysis (10000 replicates). |
In conclusion, we first reported the scorpion M. martensii Karsch BAC library in the order Scorpiones that would be important genetic resource for phylogenetic analysis, comparative genomics and toxicological research of scorpion. We also determined the complete sequence of rDNA unit that would provide not only further insight into the genomic organization of rRNA gene in scorpion species, but also important informative genetic marker for phylogeny reconstruction of the order Scorpiones.
The authors thank Prof. Wang Yaping and Liao Lanjie in Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, and Prof. Ma Lunlin, and Miss Liu Ka in Institute of Genetics and Developmental Biology, China Academy of Sciences for technical advice and assistance. This study was supported by grants from the National Natural Sciences Foundation of China to Li Wenxin (No. 30530140), the Basic Sciences of State Commission of Science Technology of China to Li Wenxin (No. 2007FY210800), the 973 Program to Wu Yingliang (No. 2010CB529800), and the Program for Changjiang Scholars and Innovative Research Team in University (No. IRT0745).
|