Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
A novel composite retrotransposon derived from or generated independently of the SVA (SINE/VNTR/Alu) transposon has undergone proliferation in gibbon genomes
Toru HaraYuriko HiraiSudarath BaicharoenTakashi HayakawaHirohisa HiraiAkihiko Koga
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2012 Volume 87 Issue 3 Pages 181-190

Details
ABSTRACT

The superfamily Hominoidea (hominoids) comprises two families: Hominidae (hominids) and Hylobatidae (gibbons, also called small apes). The SVA transposon is a composite retrotransposon that occurs widely in hominoids and is considered to have been generated by stepwise fusions of three genetic elements: SINE-R, a variable number of tandem repeat (VNTR) sequence, and Alu. We identified a novel transposon whose basic structure is the same as that of SVA, with one prominent difference being the presence of part of prostaglandin reductase 2 (PTGR2) in place of SINE-R. We designate this composite transposon as PVA and propose two possible mechanisms regarding its generation. One is the derivation of PVA from SVA: the SINE-R region of SVA was replaced with a PTGR2 fragment by template switching. The other is the formation of PVA independently of SVA: a PTGR2 fragment was fused to an evolutionary intermediate comprising the VNTR and Alu regions. The nucleotide sequence of the junction between the VNTR and PTGR2 regions supports the second hypothesis. We identified PVA in the white-cheeked gibbon Nomascus leucogenys by analysis of genome sequence databases, and subsequent experimental analysis revealed its presence in all four gibbon genera. The white-cheeked gibbon harbors at least 93 PVA copies in its haploid genome. Another SVA-like composite transposon carrying parts of the LINE1 and Alu transposons in place of SINE-R, designated as LAVA, has recently been reported. The significance of the discovery of PVA is that its substituted fragment originates not from a transposon but from a single-copy gene. PVA should provide additional insights into the transposition mechanism of this type of composite transposon; the transposition activity is conferred even if the substituted fragment is not related to a transposon.

INTRODUCTION

In humans, three transposons are, or have been until recently, active in transposition: LINE1 (abbreviated as L1), Alu, and SVA (Mills et al., 2007). These are all retrotransposons. Only L1 contains autonomous copies, where transposon autonomy is defined as the possession of all genes and signals required for its own transposition. The other two elements are nonautonomous and hijack L1’s reverse transcription system for their transposition (Dewannieux et al., 2003; Hancks et al., 2011; Raiz et al., 2012).

All these transposons reportedly cause various human diseases and malfunctions, for example, Fukuyama muscular dystrophy (Watanabe et al., 2005) and X-linked dystonia-parkinsonism (Makino et al., 2007) caused by SVA. SVA also exhibits features that may influence the genome function, for example, species-specific epigenetic states in sperm (Molaro et al., 2011) and the possession of multiple transcription start sites and multiple splicing acceptor sites (Hancks et al., 2009). A model has been proposed regarding the origin of SVA (Hancks and Kazazian, 2010); however, it is yet to be demonstrated experimentally.

SVA comprises three main parts in the following order from its 5' end: the Alu, variable number of tandem repeat (VNTR), and SINE regions. The Alu and SINE regions have sequence similarity to the respective retrotransposons. The VNTR region is thought to originate from a solo element containing the VNTR sequence and a unique 3' sequence (Hancks and Kazazian, 2010). This element was named SVA2 (Jurka, 2004). Whether SVA2 is a processed pseudogene or a retrotransposon remains unknown (Hancks and Kazazian, 2010).

SVA uses L1’s transposition machinery, including the reverse transcriptase enzyme, for the transcription reaction (Hancks et al., 2011; Raiz et al., 2012). As expected, the transposition hallmarks of L1 hold true for SVA: the generation of target site duplication (TSD) upon insertion, frequent occurrence of 5' truncation at various positions, and relocation of genomic DNA fragments by 3' transduction (Ostertag and Kazazian, 2001; Ostertag et al., 2003; Wang et al., 2005; Hancks and Kazazian, 2010). In addition, 5' transduction has been observed (Wang et al., 2005; Bantysh and Buzdin, 2009; Damert et al., 2009; Hancks et al., 2009). In rare cases, the transposition efficiency is increased by the incidence of transduction (Zabolotneva et al., 2012; Raiz et al., 2012).

In the present study, we discovered a novel SVA-like transposon, which we designated as PVA. Analysis of genome sequences from databases revealed that this transposon had proliferated in the genome of the white-cheeked gibbon (Nomascus leucogenys). Our subsequent experimental analysis showed the presence of PVA as multiple copies in all four gibbon genera. PVA comprises three main parts: part of prostaglandin reductase 2 (PTGR2), VNTR, and Alu. In humans, PTGR2 is a single-copy gene located in the long arm of chromosome 14 (14q24.1–q24.2). The white-cheeked gibbon genome contains a single-copy sequence (Gene ID on Ensembl: ENSNLEG00000016076) considered to be homologue to the human PTGR2. Although its function as a gene has not been examined in the gibbon, the exon-intron structure predicted from this sequence is similar to that of the human PTGR2, and the amino acid sequence deduced from the predicted transcript sequence (ENSNLET00000020486) exhibits a high identity (98.9% (347/351) with no insertion or deletion) to that deduced from the major human PTGR2 transcript. Thus, the possibility of its not being a true PTGR2 would be negligible. Therefore, we used PTGR2 as the name of this predicted gene.

Recently, an SVA-like transposon, designated as LAVA, whose structure is similar to SVA, was found in another gibbon species, Hoolock leuconedys, and was shown to be present in all four gibbon genera (Carbone et al., 2012). However, there is a significant difference between SVA/LAVA and PVA; the 3' regions of the former two elements are derived from transposons, whereas the corresponding region of PVA originates from a single-copy gene. This is the first report of such an element and provides additional insights into the origins of SVA-type composite retrotransposons, their transposition mechanisms, and their genomic impact in hominoids, including humans.

MATERIALS AND METHODS

Genome sequence analysis

The genome sequence data of humans (release hg19; February 2009) and the white-cheeked gibbon (release nomLeu1; January 2010) were obtained from the UCSC Genome Browser database (http://genome.ucsc.edu) and used as local databases. Human SVA, SVA2 consensus, and human endogenous retrovirus K (HERV-K) sequences were obtained from the Repbase Update database (Jurka et al., 2005; http://www.girinst.org/repbase/update/) with their respective entries. Results of gene prediction were obtained from Ensemble (http://www.ensembl.org/index.html; release 67).

The following programs and analysis packages were used in the present study: Blat (http://genome.ucsc.edu), BioPython (http://www.biopython.org), BLAST version 2.2.25+ (Camacho et al., 2009; ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+), MAFFT version 6 (Kato and Toh, 2008; http://mafft.cbrc.jp/alignment/server/index.html), RepeatMasker (developed by A. F. A. Smit, R. Hubley and P. Green; http://www.repeatmasker.org), and RepeatModeler-1.0.5 (Smit, unpublished; http://www. repeatmasker.org/RepeatModeler.html). Specific conditions for their use are explained for each case.

Animals for collection of cells and genomic DNA

The animals we used in the present study were: human Homo sapiens (an adult male donor), chimpanzee Pan troglodytes (adult male born in Kyoto University Primate Research Institute (KUPRI)), gorilla Gorilla gorilla (adult male bred at Kyoto City Zoo, Japan), orangutan Pongo pygmaeus (adult male bred at Nagoya City Higashiyama Zoo, Japan), white-cheeked gibbon Nomascus leucogenys (adult female bred at Khao Kheow Forest & Wildlife Park, Thailand), hoolock gibbon Hoolock hoolock (adult female bred at Bangabandhu Sheikh Mujib Safari Park, Bangladesh), white-handed gibbon Hylobates lar (adult male bred at Sabae City Nishiyama Zoo, Japan), siamang Symphalangus syndactylus (a female infant stillborn in Hirakawa Zoo, Japan), and rhesus macaque Macaca mulatta (adult male bred at KUPRI).

Experiments involving DNA manipulations

Genomic DNA extraction from blood samples, PCR assays, cloning of DNA fragments into plasmids, and Southern hybridization analysis were performed as described previously (Koga et al., 2007, 2011). Fluorescent in situ hybridization (FISH) analysis of chromosomes was performed following previously described procedures (Hirai et al., 2005). Specific conditions are described below for each case.

RESULTS

Identification of SVA-like retrotransposons

SVA is considered to be formed by the fusion of three genetic elements. The start point of the present study was a supposition that there may be other composite transposons that are generated by the same or similar mechanisms and comprise different genetic elements. To detect such composite transposons, we first ran the discontinuous MegaBLAST program against genome databases of human and other primate species by using various parts of SVA sequence as queries. Next, using the BioPython package, we extracted sequence blocks containing the aligned regions and their 2000-bp left-flanking and 2000-bp right-flanking regions. We then built and refined consensus sequences with these sequence data using the RepeatModeler program. Finally, we masked the obtained consensus sequences using the RepeatMasker program and referred the unmasked regions to GenBank and other general databases.

These analyses revealed two sets of multicopy sequences in the white-cheeked gibbon database, whose structure partly differed from that of SVA. These sequences comprised three main parts, two of which were equivalent to the VNTR and Alu regions of SVA. The other component was found to be part of PTGR2 using the first set of multicopy sequences and parts of the L1 and Alu transposons using the second set. The latter was identical to the LAVA transposon reported recently (Carbone et al., 2012) and will not be described here. We designated the former sequences as PVA.

Structure and nucleotide sequence of PVA

The structure of PVA is illustrated in Fig. 1. The nucleotide sequence of PVA, except for the VNTR region, is shown in Figs. 2 and 3. We excluded the VNTR region because of its hypervariability in length. PVA comprises three main parts (the PTGR2, VNTR, and Alu regions) and three smaller sequence blocks, as shown in Fig. 1.

Fig. 1.

Structure of the PVA transposon. PVA comprises the elements shown at the bottom line. The top line is the exon-intron structure of PTGR2. A portion that includes exon 4, intron 4, and exon 5 is magnified in the second line. Note that the components are not to scale.

Fig. 2.

Sequence comparisons to determine the border between SVA2-U and the PTGR2 region in PVA, and between SVA2-U and the SINE region in SVA. Asterisk indicates identical nucleotide at that site. The SVA2-U sequence is underlined. GT and AG dinucleotides that may correspond to splicing donor and acceptor sites, respectively, are shown by double underlines. The AATAAA nucleotide block that is likely to function as a poly(A) addition signal is shown by an open box. A. Comparison between the 5' part of the SINE region of gibbon SVA (SVA_Nom), the 5' part of PTGR2 region of PVA, and SVA2-U. A typical PVA sequence (GL397299.1: 1277922-1278912) and a typical SVA sequence (GL397269.1: 33184401-33186079) were extracted from the gibbon database. B. Comparison between the 5' part of the SINE region of gibbon SVA (SVA_Nom), and the corresponding region of human HERV-K. The HERV-K sequence was extracted from Repbase Update (Nucleotides 7412 -7536 of the HERV-K entry). C. Comparison between the PTGR2 region of PVA, and the corresponding region of gibbon PTGR2. The PTGR2 sequence was extracted from the gibbon database (GL397280.1: 10962198-10962543).

Fig. 3.

Sequence comparison of the Alu region between gibbon PVA and human SVA subfamilies. Sequences of human SVA subfamilies were extracted from their respective entries in Repbase Update. Asterisk indicates identical nucleotide at that site. The sequence identity to the PVA consensus, calculated by excluding insertions/deletions, is shown after the sequence of each SVA subfamily.

Composition of the PTGR2 region

The PTGR2 region is partial PTGR2 (RefSeq accession number NW_003501389, 10962229–10962533) and comprises the entire sequence of “canonical” exon 4 and the first 113 bp of intron 4. This “partial intron” is followed by a poly(A) sequence and contains a nucleotide block (AATAAA), which likely functions as a poly(A) addition signal. No part of the PVA sequence had any meaningful sequence similarity to the SINE region of SVA.

Sequence comparison of the Alu region between PVA and SVA

Human SVA copies have been grouped into six subfamilies: SVA_A to SVA_F (Wang et al., 2005). These subfamilies correspond to clusters in the SVA phylogenetic tree that was constructed based on the differentiation of the SINE region from the original HERV-K element (Wang et al., 2005). We compared the sequence of the Alu region of PVA with those of the six families, and found the highest similarity to the SVA_A subfamily (Fig. 3).

Junction of the VNTR and PTGR2 regions

PVA carries an intervening sequence (TGCAACCTTCCAAGTGTGAAGTGACAGCCTT) between the VNTR and PTGR2 regions. Sequence comparison of PVA and SVA2 revealed the identity of this intervening sequence.

The SVA2 element comprises VNTR (nucleotides 1–120 of the SVA2 entry in Repbase Update) and a 97-bp nonrepeat sequence located downstream (nucleotides 121–217). We designated the nonrepeat sequence as SVA2-U. The intervening sequence of PVA was almost identical to the first 31 bp of SVA2-U (Fig. 2A), suggesting the participation of SVA2 in the formation of PVA. In addition, the PTGR2 region that flanks the intervening sequence follows an AG dinucleotide block that functions as the splicing acceptor site in human PTGR2 (Fig. 2C).

PVA and SVA coexist in the gibbon genome. Thus, we extracted SVA sequences from the gibbon database to examine whether gibbon SVA copies have an identical or similar intervening sequence between their VNTR and SINE regions. We obtained a total of 30 SVA copies and found that all these copies had an intervening sequence almost identical to that of PVA. In addition, the border between the intervening sequence and the SINE region followed the GT–AG rule (Fig. 2, A and B). Furthermore, we examined the presence of SVA copies that carry this intervening sequence in the human genome. A homology search with the BLAST program (-task blastn-short) using SVA2-U as a query revealed 60 SVA copies, and all of these were members of the SVA_A subfamily.

Estimation of copy number

We extracted individual PVA sequences from the gibbon genome database and analyzed them for the copy number. Using the sequence of the PTGR2 region as a query, we conducted a MegaBLAST homology search at default settings. This resulted in a list of 157 aligned sequences with E-values of < 2 × 10−12. One of these 157 sequences contained the PTGR2 gene region, and was not used in our subsequent analyses. We extracted the 156 aligned sequences along with their flanking regions. In most cases, the length of the flanking regions was up to 2000-bp each on both sides, with additional regions being used as necessary. These were subsequently examined with the RepeatMasker program and UCSC Blat program to delimit the PVA regions and their TSD sequences by comparing Blat hit results from multiple hominoid species.

From the 156 sequences used as raw data to be analyzed, we excluded 15 sequences whose insertion sites could not be uniquely identified because of structural changes in the gibbon and/or the reference species. In the gibbon, most structural changes were deletions of flanking region(s), and these are likely to be deletions reported to accompany the SVA transposition (Lee et al., 2012). We also excluded 8 sequences that contained an N nucleotide (a nucleotide not successfully converted) or an N string within the PTGR2 region, 46 sequences that contained N('s) between the PTGR2 region and its flanking chromosomal region, 2 sequences that lacked large parts of the PTGR2 region by 5' truncation. There were overlaps among these exclusions, and the number of sequences used for this analysis was 93.

In addition to the search for PVA, a MegaBLAST search was conducted against the gibbon genome database using the SINE region sequence of human SVA (nucleotides 1119–1640 of the SVA entry in Repbase Update) as a query. We found 30 SVA copies, and this result is consistent with that of a previous study (Wang et al., 2005). Thus, we inferred that the copy number of PVA is about three times larger than that of SVA.

TSDs and length of PVA

In addition to estimating the copy number, the results of the homology search described earlier enabled us to analyze length variations, identify TSD sequences, and even categorize the PVA copies according to the variant types previously defined for SVA (Damert et al., 2009). Table 1 shows these features of all the 93 PVA copies, and Fig. 4 is a histogram illustrating the length variation of PVA. The length of PVA was distributed between 323 bp and 2478 bp, depending on the VNTR length, poly(A)-tail length, and structural changes such as 5' truncation. The size range of TSDs and frequent occurrence of 5' truncation suggest the dependence of the PVA transposition reaction on the L1 transposition machinery.

Table 1. Characterization of PVA copies found in the gibbon genome
No.TypePVA (bp)TSD (bp)LocationNo.TypePVA (bp)TSD (bp)Location
 15' truncation32311GL397437, +, 794473 – 79479548canonical100314GL397272, +, 37697076 – 37698078
 25' truncation3249GL397437, +, 803094 – 80341749canonical102612GL397355, +, 7377493 – 7378518
 35' truncation36613GL397389, +, 1156295 – 115666050canonical104013GL397301, +, 3412318 – 3413357
 45' truncation3824GL397450, +, 71920 – 7230151canonical104512/13GL397442, –, 233512 – 234556
 55' truncation39121GL397302, +, 9602076 – 960246652canonical105410GL397296, +, 605120 – 606173
 65' truncation39914GL397515, –, 313131 – 313529535' truncation105716GL397285, +, 17358306 – 17359362
 75' truncation40114GL397555, +, 195346 – 195746543' trunsduction107016GL397318, +, 1025408 – 1026477
 85' truncation40218GL397299, +, 2732709 – 2733110553' trunsduction107714GL397305, +, 9677654 – 9678730
 95' truncation41711/12GL397262, +, 13609205 – 1360962156canonical108813GL397326, +, 14041076 – 14042163
105' truncation43223ADFV01139252, –, 12256 – 1268757canonical10956GL397306, +, 4087605 – 4088699
115' truncation43415GL397340, –, 9706804 – 9707237585' truncation111112GL397302, –, 9342609 – 9343719
125' truncation44915GL397261, –, 54947513 – 5494796159canonical111111GL397499, –, 735571 – 736681
135' truncation4743GL397287, –, 20289161 – 20289634605' truncation11354GL397283, +, 24873343 – 24874477
145' truncation47710GL397367, +, 4311612 – 4312088613' trunsduction113912GL397303, –, 18558538 – 18559676
155' truncation48310GL397359, –, 4224834 – 422531662canonical113919GL397372, +, 951128 – 952266
165' truncation48626GL397382, +, 1672651 – 167313663canonical114414ADFV01136848, +, 26939 – 28082
175' truncation49913GL397322, –, 3845419 – 384591764canonical115516/17GL397267, +, 30459064 – 30460218
185' truncation5163GL397268, +, 43078580 – 4307909565canonical116911GL397453, +, 1546227 – 1547395
195' truncation53619GL397381, +, 1103186 – 1103721663' trunsduction118510GL397356, +, 2962496 – 2963680
205' truncation53919GL397326, –, 9270245 – 927078367canonical11926GL397584, +, 90896 – 92087
215' truncation62513GL397301, +, 444008 – 44463268canonical120517GL397373, +, 1913876 – 1915080
225' truncation63811GL397368, +, 5768930 – 576956769canonical121720GL397475, +, 1281245 – 1282461
235' truncation67310GL397477, –, 1710403 – 171107570canonical125914GL397332, +, 9132309 – 9133567
24unknown70013GL397335, –, 9618886 – 9619585713' trunsduction127516/17GL397272, +, 3000911 – 3002185
255' truncation7489GL397323, –, 3593499 – 359424672canonical127714GL397306, +, 18017623 – 18018899
265' truncation77111GL397370, +, 3420251 – 342102173canonical128416GL397261, +, 61072814 – 61074097
275' truncation8066GL398191, +, 17016 – 1782174canonical132014GL397266, +, 8121916 – 8123235
285' truncation82711GL397437, –, 1941131 – 194195775canonical13247GL397544, –, 459850 – 461173
29canonical83317GL397277, +, 30334916 – 3033574876canonical132720GL397301, +, 3434467 – 3435793
30canonical84513GL397279, +, 22935106 – 2293596377canonical136212GL397316, –, 2729091 – 2730452
31canonical86411GL397537, –, 894729 – 89559278canonical137215GL397309, +, 10757450 – 10758821
325' truncation86820GL397504, –, 108248 – 10911579canonical138016/18GL397274, +, 14619951 – 14621330
33canonical87712GL397312, +, 5975854 – 5976730805' trunsduction139814GL397393, –, 2492234 – 2493631
34canonical88914/16GL397272, –, 3879557 – 388044581canonical140112GL398298, +, 10757 – 12157
35canonical89515GL397328, –, 31778 – 32672823' trunsduction141613GL397287, +, 8426737 – 8428152
365' truncation89812GL397275, +, 1489372 – 1490269835' truncation142610GL397281, –, 22760805 – 22762230
37canonical9009GL397479, +, 1316872 – 1317771843' trunsduction144314/15GL397334, –, 9508105 – 9509547
38canonical91111GL397296, +, 247203 – 24811385canonical165812GL397276, –, 32330003 – 32331660
39canonical9268/10GL397273, +, 1205654 – 120657986canonical170516GL397425, +, 1494179 – 1495883
40canonical95712GL397288, –, 19518386 – 19519342873' trunsduction17084GL397285, –, 26879474 – 26881181
41canonical96418GL397299, +, 14146871 – 14147834885' truncation189113GL397303, –, 6304456 – 6306346
42canonical9736GL397266, +, 24563664 – 2456463689canonical190214GL397266, +, 47555703 – 47557604
43canonical97917GL397275, +, 18454251 – 18455229905' truncation198215GL397402, +, 2093059 – 2095040
44canonical98215GL397385, +, 1453448 – 1454429915' trunsduction199515GL397544, +, 87629 – 89623
45canonical98510GL397327, +, 3471001 – 3471985925' trunsduction203013GL397314, –, 10577965 – 10579994
46canonical99112GL397299, +, 1277922 – 127891293canonical247812GL397381, –, 917928 – 920405
47canonical99118GL397318, –, 13178629 – 13179619
Fig. 4.

Size distribution of PVA. The histogram shows the size distribution among 93 PVA copies.

Distribution of PVA among species

Using the PTGR2 region as a query, we repeatedly ran the MegaBLAST program with default settings against human and other primate genome databases. We examined five species of the family Hominidae (human, chimpanzee, bonobo, gorilla, and orangutan) and seven species outside the superfamily Hominoidea (rhesus macaque, Hamadryas baboon, marmoset, bushbaby, mouse lemur, tarsier, and squirrel monkey). No evidence for multicopy occurrence was obtained in these species. Within the family Hylobatidae, a public genome database is available only for the white-cheeked gibbon. Therefore, MegaBLAST analysis was not applicable to other gibbon species.

Our experimental analyses in two lines provided clear evidence for the presence of PVA as multiple copies in the white-cheeked gibbon and other gibbon species. The first approach was the use of FISH analysis for white-cheeked gibbon chromosomes. We cloned part of human PTGR2 that roughly corresponded to the PTGR2 region of PVA and used it as a probe. As shown in Fig. 5, we observed yellow and red stripes in all the 58 chromosomes. These stripe patterns indicate that PVA is interspersed throughout the white-cheeked gibbon genome.

Fig. 5.

FISH analysis of white-cheeked gibbon chromosomes. Probe DNA was labeled with fluorescent isothiocyanate (FITC), and overnight hybridization was performed under moderately stringent conditions. Yellow color indicates hybridization signals because of the labeled probe, and red color indicates regions to which the probe did not hybridize. The horizontal bar indicates 10 μm.

The second experiment was genomic Southern hybridization using the same probe as that used in FISH analysis. The family Hylobatidae comprises four genera (Hoolock, Hylobates, Nomascus, and Symphalangus), and we included one species from each genus in our analysis. In the autoradiogram obtained (Fig. 6), we observed multiple bands in the lanes for the gibbons and no band for the four Hominidae species or rhesus macaque. The first lane that contained ten times greater amount of human genomic DNA produced a single band at position 5.8 kb. The single-copy PTGR2 is responsible for this band because this size is consistent with that expected from the genomic sequence in the human genome database.

Fig. 6.

Genomic Southern blot analysis for the distribution of PVA among species. Four species of the family Hominidae, four of the family Hylobatidae, and one outside the superfamily Hominoidea were examined. The family Hylobatidae comprises four genera, and one species from each genus was included in this analysis. These four species are shown with their scientific names. Genomic DNA was digested with the HindIII restriction enzyme. The leftmost lane for humans contained 5.0 μg of genomic DNA, and the other lanes contained 0.5 μg each.

DISCUSSION

In the present study, we first identified the composite retrotransposon PVA present in the white-cheeked gibbon genome by database mining, and subsequently confirmed its presence in all four gibbon genera by cloning and hybridization experiments. We propose two possible explanations regarding the origin of PVA (Fig. 7): (1) PVA was derived from SVA, and (2) PVA was formed independently of SVA.

Fig. 7.

Two models for the generation of PVA. Abbreviations are: C, CT-rich repeats; A, Alu region; V, VNTR region; U, SVA2-U sequence; E4, exon 4; E5, exon 5; A. Model for the derivation of PVA from SVA. B. Model for the formation of PVA independently of SVA.

Details of the first hypothesis are that PVA was formed by template switching between a transcript from an SVA copy of the SVA_A subfamily and a transcript from PTGR2. The PTGR2 fragment was terminated then or later, or had already been terminated, at the internal position of intron 4. A similar mechanism has been proposed for the generation of some L1-related composite transposons (Buzdin, 2004).

The second hypothesis assumes the occurrence of an evolutionary intermediate comprising Alu and SVA2, and postulates a fusion of this intermediate to PTGR2 by aberrant splicing. One prominent feature of the PTGR2 region is that its 5' half is canonical exon 4 of PTGR2; the 5' end of the PTGR2 region exactly corresponds to the first nucleotide of exon 4. In addition, the junction between the 31-bp intervening sequence and the PTGR2 region strictly follows the GT–AG rule for splicing (Fig. 2). This provides powerful support for the second hypothesis.

The second hypothesis logically requires another assumption that the evolutionary intermediate was once inserted into the upstream region of exon 4 of PTGR2; however, such a mutant allele or chromosomal haplotype has not been found. It is easy, however, to postulate that such a variant was subject to negative selection because PTGR2 has been shown to be an essential enzyme in humans (Zhang et al., 2003; Wu et al., 2008). Extinction by random genetic drift is also a possible explanation.

Recently, another composite retrotransposon, designated as LAVA, with the same basic structure as that of SVA has been found (Carbone et al., 2012). SVA, LAVA, and PVA comprise three main parts, two of which are common to these three transposons: the Alu region in the 5' portion and the VNTR region in the internal portion. The origin of the component in the 3' portion is SINE for SVA, L1 and Alu for LAVA, and PTGR2 for PVA.

The presence of three different retrotransposons with the same basic structure in a single hominoid genome raises the possibility that there may be other unknown retrotransposons and even the possibility of future generation of new retrotransposons. In this context, PVA is not merely a third example. Its 3'-portion component is derived from a single-copy gene, whereas those of SVA and LAVA are derived from transposons. Thus, the discovery of PVA suggests that a wider range of genetic elements than currently supposed can serve as sources of a 3'-portion component of this type of composite retrotransposon.

ACKNOWLEDGMENT

We are grateful to the Great Ape Information Network for tissue samples of gorilla and orangutan, and Ms. Israt Jahan for a sample of hoolock gibbon. We thank the Gibbon Genome Sequencing Consortium for making their data publicly available, and the BCM-HGSC for providing the draft genome assembly nomLeu1. This work was supported by Grants-in-Aid (24370098 and 23657165 to AK, and 22247037 to HH) from MEXT of Japan.

REFERENCES
 
© 2012 by The Genetics Society of Japan
feedback
Top