Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full paper
Establishment of a genome-wide and quantitative protocol for assessment of transcriptional activity at human retrotransposon L1 antisense promoters
Koichi IshiguroSaneyuki HigashinoHideki HirakawaShusei SatoYasunori Aizawa
著者情報
ジャーナル オープンアクセス HTML
電子付録

2017 年 92 巻 5 号 p. 243-249

詳細
ABSTRACT

Long interspersed element 1 (L1) retrotransposon sequences are widespread in the human genome, occupying ~500,000 locations. The majority of L1s have lost their retrotransposition capability, although a significant population of human L1s maintains bidirectional transcriptional activity from the internal promoter. While the sense promoter drives transcription of the entire L1 mRNA and leads to L1 retrotransposition, the antisense promoter (ASP) transcribes L1-gene chimeric RNAs that include neighboring exon sequences. Activation mechanisms and functional impacts of L1ASP transcription are thought to vary at every L1ASP location. To explore the locus-specific regulation and function of L1ASP transcription, quantitative methodology is necessary for identifying the genomic positions of highly active L1ASPs on a genome-wide scale. Here, we employed deep-sequencing techniques and built a 3’ RACE-based experimental and bioinformatics protocol, named the L1 antisense transcriptome protocol (LATRAP). In LATRAP, the PCR primer and the read mapping scheme were designed to reduce false positives and negatives, which may have been included as hits in previous cloning studies. LATRAP was here applied to the A549 human lung cancer cell line, and 313 L1ASP loci were detected to have transcriptional activity but differed in the number of mapped reads by four orders of magnitude. This indicates that transcriptional activities of the individual L1ASPs can vary greatly and that only a small population of L1ASP loci is active within individual nuclei. LATRAP is the first experimental method for ranking L1ASPs according to their transcriptional activity and will thus open a new avenue to unveiling the locus-specific biology of L1ASPs.

INTRODUCTION

Long interspersed element 1 (L1) is one of the major classes of mammalian retrotransposons (Deininger and Batzer, 2002). L1-derived sequences are recognized at ~500,000 loci throughout the human reference genome and account for ~17% of the total base pairs (Lander et al., 2001). Most L1s are 5′-truncated, but there are ~6,000 full-length L1 elements spread throughout the human haploid genome (Penzkofer et al., 2005). The full-length element is ~6,000 bp and contains a bidirectional internal promoter in the 5’ region (Fig. 1A). Activation of the sense-strand promoter leads to L1 retrotransposition. The sense promoter transcribes a bicistronic L1 RNA containing two open reading frames (ORF1 and ORF2) (Dombroski et al., 1991). The proteins encoded by both ORFs, which are indispensable for L1 retrotransposition (Moran et al., 1996; Kazazian and Moran, 1998), form an L1 ribonucleoprotein (RNP) complex with L1 mRNA (Hohjoh and Singer, 1996; Doucet et al., 2010). Endonuclease and reverse transcriptase activities of the ORF2 protein in the L1 RNP drive the initial phases of retrotransposition, by DNA nicking followed by the simultaneous processes of synthesis and integration of L1 cDNA into the nicked strand of the genomic DNA (Luan and Eickbush, 1995; Feng et al., 1996; Moran et al., 1996). The DNA repair machinery ensures that the L1 cDNA sequence is also generated on the other genomic DNA strand, completing L1 retrotransposition. There are 40–50 full-length L1 elements encoding retrotransposition-competent L1RNP machinery per haploid human genome (Brouha et al., 2003; Beck et al., 2010).

Fig. 1.

Structure and transcription of the L1 antisense promoter (ASP). (A) Structure of full-length L1 showing the initiation sites of the sense promoter (SP) and ASP. The internal promoter initiates transcription at the 5′ terminus in the sense direction (with the SP) and within the region between 379 and 498 bp relative to the 5′ terminus in the antisense direction (with the ASP). As denotes a poly(A) tail. (B) Transcripts generated by the ASP are categorized into six types (Nigumann et al., 2002). The diversity originates from two splice donors (at 347 and 262 bp) and two splice acceptors (at 232 and 116 bp), which are situated downstream of the transcription initiation region (379–498). Unspliced transcripts belong to Type I. Spliced transcripts that use either acceptor are classified as Type II, III or IV, depending on the splice donor-acceptor combination. Transcripts that are spliced at either donor and use any acceptor outside the L1 promoter are classified as Type V or VI. The gray-shaded area shows positions targeted by ASP-specific 3’ RACE primers in this and previous studies: primers for base pairs between 362 and 379 (this study; green), between 64 and 82 (Cruickshanks and Tufarelli, 2009), between 75 and 93 (Cruickshanks and Tufarelli, 2009) and between 95 and 117 (Macia et al., 2011). Basepair numbering is based on the L1HS sequences in RepBase (Jurka et al., 2005).

While the sense promoters of all L1s produce a single type of RNA, L1 mRNA, as described above, the antisense promoters (ASPs) at different L1 loci transcribe RNAs with distinct 3’ sequences (Speek, 2001; Nigumann et al., 2002; Wheelan et al., 2005; Matlik et al., 2006; Cruickshanks and Tufarelli, 2009; Faulkner et al., 2009; Macia et al., 2011; Criscione et al., 2016). Activation of the ASP starts transcription in the middle of the ~900-bp promoter, between nucleotides 379 and 498 relative to the L1 5′ terminus (Yang and Kazazian, 2006), and transcribes the genomic region flanking the L1 5’ end (Fig. 1A) (Speek, 2001). When L1s reside in introns or upstream of genes, their ASPs can act as alternative promoters for those genes, producing chimeric RNAs by splicing between L1ASP and the downstream exons of the gene. This splicing occurs efficiently, because there are two strong splice donors downstream of the transcription initiation region on the L1ASP (Fig. 1B) (Nigumann et al., 2002). This locus-specific transcriptional activation of L1ASPs and resultant chimeric L1-gene transcripts potentially influences the expression of neighboring genes, leading to perturbation of signaling pathways in which the neighboring genes are involved, as observed with an intronic L1ASP in the MET oncogene (Weber et al., 2010). More recently, it was reported that an ASP-driven transcript contains a translatable ORF that initiates from an AUG within the L1ASP sequence (Denli et al., 2015). The L1ASP-derived protein seems to have a regulatory role in retrotransposition. Additionally, however, one can speculate that the protein products derived from different L1ASP loci have distinctive functionality. The C-terminal sequences of the proteins encoded on L1ASP-derived transcripts could be identical to the C-terminal portions of proteins which are encoded in exons transcribed by the L1ASP. It is possible that the C-terminal portions include functional domains. Taking these points into consideration, to deeply understand functional impacts of L1ASP activation on the transcriptome and proteome, it is imperative to develop methodology to locate highly activated L1ASPs on a genome-wide scale.

For this purpose, we established a method called the L1 antisense transcriptome protocol (LATRAP). LATRAP employs 3′-rapid amplification of cDNA ends (3′ RACE) to amplify L1ASP-driven transcripts, followed by deep sequencing of the resulting PCR products. Primers for 3′ RACE-PCR and a bioinformatics scheme for read mapping were carefully designed to identify genuine transcriptionally active L1ASPs. In this study, we applied LATRAP to a human lung cancer cell line, A549, and identified L1ASP loci that differ in the number of mapped reads. This allows us to rank L1ASPs in accordance with the transcriptional activity. LATRAP is a useful genome-wide experimental framework for unveiling locus-specific L1ASP biology in the near future.

MATERIALS AND METHODS

Cell culture

A549 cells were cultured in Dulbecco’s modified Eagle’s medium containing 10% (v/v) fetal bovine serum (Biowest) at 37 ℃ in a 5% CO2 atmosphere.

LATRAP of A549 cells

Total RNA was isolated from A549 cells using an RNeasy kit (Qiagen) and treated with RNase-free DNase I (Qiagen) during extraction. Total RNA (1 μg) was reverse-transcribed in a 20-μl reaction with the LATRAP RT primer (5’-TTTATCACCATTTCCTCATTCTCATTTTTTTTTTTTTTTTT) using a Superscript III First-Strand cDNA Synthesis kit (Invitrogen). PCR amplification was performed in a 50-μl reaction with 0.4 μl cDNA, 1.25 U PrimeSTAR GXL polymerase (TaKaRa), 1 × PrimeSTAR GXL Buffer (TaKaRa), 0.2 mM dNTPs, 0.2 μM L1ASP primer (5’-GAGATTCCGTGGGCGTAG) and 0.2 μM LATRAP PCR primer (5’-TTTATCACCATTTCCTCATTCTCA). The following thermocycling conditions were used: 98 ℃ for 30 sec; 5 cycles of 98 ℃ for 10 sec and 68 ℃ for 3 min; 35 cycles of 98 ℃ for 10 sec, 60 ℃ for 15 sec and 68 ℃ for 30 sec; 68 ℃ for 7 min. PCR products were resolved by 1% agarose gel electrophoresis, and DNA fragments in the range of 300–600 bp were purified using a Gel Extraction kit (Qiagen).

Purified DNA was sequenced on a 454/Roche GS-FLX sequencer. The 454 sequencing library was prepared using a GS Titanium Rapid Library Preparation kit (Roche Applied Science). Amplification and sequencing of libraries were performed using GS FLX Titanium Sequencing kits on a Genome Sequencer FLX Instrument (Roche/454 Life Sciences). Raw sequencing data are available under accession DRR072354 (DNA Data Bank of Japan: http://www.ddbj.nig.ac.jp/).

Reads were mapped using the BLAT search program for the human reference genome (GRCh37/hg19, Feb. 2009) (Kent, 2002). When the score of the second hit at a genomic locus was < 90% of that of the first hit, we defined the read as “uniquely mapped” to the locus. Of the uniquely mapped reads, we collected those with 5′- or 3′-terminal sequences that were well aligned to the first 500 nt of the human-specific L1 (L1HS) consensus sequence (Jurka et al., 2005). Finally, we defined LATRAP reads as those that were uniquely mapped to the reference genome with at least one gap of > 49 bp. Virtually all of these reads represent splicing events of the original transcripts.

L1 bioinformatics

To determine the numbers of L1HS-, L1PA2-, and L1PA3-derived ASPs in the human reference genome (hg19), a BLAST search was performed. The default setting was against all L1HS, L1PA2 and L1PA3 sequences in the RepeatMasker Table of the UCSC Table Browser using the 5′-terminal and 1000-bp sequences of L1HS as a query (Karolchik et al., 2004). The resulting hits were subjected to a second BLAST search against the sequence of the ASP initiation region (379 to 498 relative to the L1 5′ terminus; Fig. 1A) (Yang and Kazazian, 2006) and the obtained hits were counted. Basepair numbering in this study is based on the L1HS sequences in RepBase (Jurka et al., 2005).

RESULTS AND DISCUSSION

Primer design for LATRAP PCR

In the first step of LATRAP, total RNA is reverse-transcribed using the LATRAP RT primer and then amplified by PCR using the LATRAP PCR primer and L1ASP-specific primer (see Materials & Methods for the primer sequences). The LATRAP PCR primer is identical to the 5′ part of the RT primer based on the principle of the 3′ RACE experimental design. The L1ASP primer is designed to target the region between the ASP transcription initiation region (between nucleotides 498 and 379 relative to the 5′ terminus of full-length L1) (Yang and Kazazian, 2006) and the first splice donor site from the ASP initiation region (at nucleotide 347) (Fig. 1B) (Nigumann et al., 2002). This specific targeting allows amplification of cDNAs from all six splice isoforms of poly(A)+ RNAs to be transcribed from the ASPs (Nigumann et al., 2002). Notably, previously reported cloning experiments of L1ASP-driven transcripts were also carried out by 3’ RACE, but they utilized L1ASP-specific primers targeting the region downstream of the two L1ASP splice acceptors (shown by black arrowheads in Fig. 1B) (Cruickshanks and Tufarelli, 2009; Macia et al., 2011). With these primers, only unspliced transcripts (Type I) or transcripts using both splice donor and acceptor within the L1ASP sequence (Types II, III and IV) could be cloned by RT-PCR. Therefore, L1ASP chimeric transcripts with downstream gene exons (Types V and VI) failed to be identified. Our primer design was expected to allow us to rescue the false negatives of these previous studies.

Pilot LATRAP experiment in A549 cells

In this study, the 3’ RACE protocol was applied to total RNA extracted from A549 cells. The obtained PCR product was sheared, deep-sequenced using a 454/Roche GS-FLX sequencer, and mapped to the human reference genome using a BLAT search (GRCh37/hg19, Feb. 2009). This yielded 257,651 uniquely mapped reads, from which 525 L1ASP loci were identified as candidate loci of transcriptionally active positions (Table 1). Careful assessment of the mapping patterns revealed some false positives (Fig. 2). For example, a mapped read cluster, which starts in the L1 5′ region and extends into the flanking genomic region, was observed in the 9th intron of XPR1 (Fig. 2A, top and middle). This mapping pattern could reflect ASP-mediated transcription generating an unspliced transcript (Type I in Fig. 1B). However, the cluster of unspliced reads ended at consecutive As in a neighboring Alu element (Fig. 2A, bottom), raising another possibility that these reads resulted from internal hybridization of the LATRAP RT primer to the consecutive stretch of As in the middle of the nascent (unspliced) transcripts of the parental XPR1 at the reverse transcription step of the 3’ RACE procedure.

Table 1. Summary of 454 pyrosequencing and subsequent read selection
DescriptionReadsAverage length (bp)% of total reads
Total reads296,079255
Uniquely mapped reads257,65126087.0
Reads containing L1ASP sequence at either end45,39928715.3
 LATRAP reads (potentially spliced reads)34,83228211.8
 Unspliced reads10,5673043.6
Fig. 2.

Read mapping at XPR1 (A) and SCFD1 (B) loci in the UCSC Genome Browser format (Kent et al., 2002). At both loci, unspliced reads gave rise to clusters spanning the ASPs of intronic full-length L1s (green) and consecutive As in the flanking genomic sequences such as an Alu in the XPR1 locus (blue). This suggests that these reads originated from internal hybridization of the poly(dT) part of the RT primer with nascent transcripts from the parental gene promoters. To exclude these false positives, we used only spliced reads partially overlapping with L1ASPs, termed “LATRAP reads,” such as those indicated in red at the SCFD1 locus (B).

The same pattern of unspliced read mapping was observed for the intronic L1 in SCFD11 (Fig. 2B). The mapping cluster of unspliced reads that overlapped with this L1 ends at a genomic region containing 10 consecutive As. However, we also found at this locus spliced reads suggestive of Type VI transcripts containing the L1 5′ region and downstream SCFD1 exons (Fig. 2B, shown in red). Based on our manual curation of read mapping patterns at several L1ASP loci including these two, we decided to use only spliced reads that were uniquely mapped to L1ASPs (termed “LATRAP reads”) for identifying active L1ASPs by LATRAP. This does exclude genuinely active L1ASPs that generate only Type I transcripts, but LATRAP needs to produce fewer false positives, which is important for studying locus-specific functionality of active L1ASPs in the future. Using this call criterion, we determined 313 L1ASPs, each of which yielded at least one LATRAP read in this pilot experiment using A549 cells (Supplementary Table S1).

Characterization of L1s with active ASPs in A549 cells

The 313 L1ASPs are remarkably different in the number of mapped LATRAP reads (Fig. 3). The majority of the L1ASPs were mapped with a small number of LATRAP reads. Of the 313 L1ASP loci, 119 (38%) were mapped with only one LATRAP read, and 41 (13%) and 22 (7%) loci yielded two and three LATRAP reads, respectively. Consistent with a previous report, these data may reflect the low cognate expression activities of most L1ASPs in A549 cells (Faulkner et al., 2009). On the other hand, a limited number of L1ASPs (12/313, 3.8%) gave rise to more than 500 LATRAP reads each and represent the most transcriptionally active loci in A549 cells. These 12 L1ASPs contribute to the A549 transcriptome in different modes. Three L1ASPs act as the parental promoters of annotated genes (FOCAD, MIR2052HG and CASC9). Four intronic L1ASPs, oriented in the same direction as the genes harboring the L1s (BCAS3, PLCB1, SCFD1 and MET), are used as alternative promoters for those genes, creating L1-gene chimeric RNAs. The remaining 5 L1ASPs are located between genes and actively transcribe intergenic and poly(A)+ RNAs, which are not currently annotated but are potentially involved in various cellular functions as long noncoding RNAs (Ulitsky and Bartel, 2013).

Fig. 3.

Distribution of the number of LATRAP reads uniquely mapped to individual L1ASPs in A549 cells. Brief descriptions are shown for 12 L1ASPs uniquely mapped with more than 500 LATRAP reads.

Another noteworthy feature of the 313 L1ASPs is that heavily truncated L1s can possess transcriptional activity in L1ASPs. For instance, the L1 functioning as a parental promoter of PRKG2 has only 1,137 bp of the 5′-terminal region, including the entire internal promoter and part of ORF1 (Fig. 1A). In addition to the full-length L1ASPs, truncated L1s have the potential to become transcriptionally active as long as they include the ASP regions.

Evolutionarily young L1 subfamilies, such as L1HS, L1PA2 and L1PA3, account for most of the 12 L1ASPs found to be the most active as well as others mapped with fewer LATRAP reads. It should be noted that LATRAP could preferentially identify relatively young active L1ASPs because the L1ASP-specific PCR primer was designed against L1HS. Owing to the significant sequence divergence between L1HS and the older L1 subfamilies, the use of this primer did not allow cloning of L1ASPs originating from much older subfamilies of L1s. On the other hand, CAGE (cap analysis of gene expression) screening identified many active ASPs of older subfamilies, including L1M (mammals) (Faulkner et al., 2009). CAGE reads are too short to identify uniquely one active L1 among young L1 family elements in the genome, which are almost identical in their nucleotide sequences around the transcription initiation sites of ASPs. On the contrary, CAGE reads tend to map uniquely to older L1 loci, all of which are evolutionarily more diverse in their sequences than young L1s. LATRAP can thus be a complementary approach to the CAGE method to identify active L1ASPs of all evolutionary ages. Alternatively, designing primers that are specialized for L1M consensus sequences may extend the applicable range of LATRAP to older subfamilies.

The human reference genome (GRCh37/hg19) contains 414, 1,090 and 1,578 ASPs originating from the youngest L1 subfamilies L1HS, PA2 and PA3, respectively (Karolchik et al., 2004). Of these, only a small number of L1s have transcriptionally active ASPs in A549 cells. This strongly suggests that ASP activation is not primarily induced by internal sequences, as nucleotide sequences of these young L1ASPs are all quite similar, but rather controlled by the genomic environments of the L1ASP loci. However, our current understanding is primitive on what genetic/environmental factors contribute to activation of only a subset of L1ASPs. Therefore, it is currently unpredictable, out of the thousands of young L1ASPs, which L1ASPs become active in any given cellular or tissue conditions. For our further understanding of mechanisms of locus-specific L1ASP activation, LATRAP is useful for studying each of the active L1ASPs by separating the majority of L1ASPs that are transcriptionally silent. Mouse L1 promoters also possess ASP activity, despite having nucleotide sequences that differ completely from human ASPs (Li et al., 2014). It will be of interest to apply the LATRAP strategy to methylation-deficient and/or PIWI-interacting RNA pathway-deficient mice so as to address a fundamental question of L1ASP regulation: how does hypomethylation affect transcriptional activity at individual L1ASP loci in vivo (Bourc’his and Bestor, 2004; Kuramochi-Miyagawa et al., 2008; Crichton et al., 2014)?

In addition to the regulatory mechanism of L1ASP activation, the functional impact of L1ASP activation on the host system also needs to be examined separately at every L1ASP locus. L1ASP-derived transcripts are believed to play functional roles in gene regulation and/or cell signaling, although this remains hypothetical. As described in the Introduction, activation of different L1ASP loci can lead to expression of RNA transcripts and polypeptides that vary in their nucleotide and amino acid sequences. It is therefore foreseeable that functional effects of activation of L1ASPs located at different genomic positions are different, even if they are categorized into the same subfamily. In this situation, LATRAP is a powerful experimental benchmark to pinpoint active L1ASPs in any cellular and/or tissue samples from which total RNAs are available, allowing us to study the locus-specific functionality of L1ASPs.

ACKNOWLEDGMENTS

We thank Drs. Ken Kurokawa (National Institute of Genetics) and Fumito Maruyama (Kyoto University) for assistance with data analysis, Dr. Hiroyuki Ohta (Tokyo Institute of Technology) for the pyrosequencing analysis, and Dr. Takashi Hirano (Okinawa Institute of Advanced Sciences), Dr. Haruhiko Siomi (Keio University) and the members of the Aizawa Laboratory for fruitful discussions and strong support. This work was supported by the Global COE Program “From the Earth to ‘Earths’” and by the Okinawa Research and Industrialization Project for the Forefront of Medical Care.

REFERENCES
 
© 2017 by The Genetics Society of Japan
feedback
Top