Origin of an endogenous bornavirus-like nucleoprotein element in thirteen-lined ground squirrels

Yoshiyuki Suzuki*, Yuki Kobayashi, Masayuki Horie and Keizo Tomonaga Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya-shi, Aichi-ken 467-8501, Japan Nihon University Veterinary Research Center, 1866 Kameino, Fujisawa-shi, Kanagawa 252-0880, Japan Transboundary Animal Diseases Research Center, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto 1-21-24, Kagoshima 890-0065, Japan Department of Viral Oncology, Institute for Virus Research (IVR), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan

Endogenous bornavirus-like (EBL) elements are nucleotide sequences homologous to bornavirus genes in the genomes of animals (Horie and Tomonaga, 2011).EBL elements homologous to N (EBLN) and G (EBLG) have been identified in the genomes of vertebrates, and those homologous to M (EBLM) and L (EBLL) in the genomes of vertebrates and invertebrates (Belyi et al., 2010;Horie et al., 2010Horie et al., , 2013;;Katzourakis and Gifford, 2010).It has been proposed that endogenous virus-like elements are involved in the protection against infection by homologous viruses (Arnaud et al., 2007;Aswad and Katzourakis, 2012).Indeed, there appears to be a negative correlation between the presence of EBLN elements and the susceptibility to bornavirus infection (Belyi et al., 2010;Katzourakis and Gifford, 2010).In addition, since Borna disease is induced by immune responses to infected cells expressing Edited by Toshihiko Shiroishi * Corresponding author.E-mail: yossuzuk@nsc.nagoya-cu.ac.jpN (Planz and Stitz, 1999;Stitz et al., 2002), expression of proteins from EBLN elements may cause immunological tolerance to BDV infection (Horie et al., 2013).Therefore, even if anti-viral effects of EBLN are insufficient to prevent infection, infected organisms may survive, possibly serving as reservoirs (Belyi et al., 2010).
EBL elements generally consist of a single gene, sometimes with transcription start and termination signals of bornavirus and poly(A) tails.Thus, they are considered to have originated from bornavirus mRNA through reverse transcription (Horie et al., 2010;Kinnunen et al., 2011).Although bornaviruses do not encode a reverse transcriptase, the enzymatic activity could be provided by long terminal repeat (LTR) retrotransposons such as exogenous and endogenous retroviruses (XRVs and ERVs, respectively), non-LTR retrotransposons such as long interspersed element-1 (LINE-1), and telomerase (Wells et al., 1990;Maida et al., 2009;Kopera et al., 2011).From the observation that primate EBLN-1 to -4 as well as some other EBL elements are followed by a poly(A) tail and flanked by a direct repeat, it was hypothesized that EBL elements were produced mainly with the aid of LINE-1 (Belyi et al., 2010;Horie et al., 2010;Katzourakis and Gifford, 2010).LINE-1 is known to be responsible for the retrotransposition of short interspersed elements (SINE) as well as itself, and for the formation of pseudogenes (Esnault et al., 2000).EBLN-1 to -4 were inferred to have been generated independently before the divergence of Old World and New World monkeys (44.2 million years ago [MYA]), which coincided with the period when LINE-1 was active in primates (40-50 MYA) (Ohshima et al., 2003;Hedges et al., 2006;Kobayashi et al., 2011).It should be noted, however, that several EBL elements lack the signatures of LINE-1-mediated integration (Belyi et al., 2010;Horie et al., 2010Horie et al., , 2013;;Katzourakis and Gifford, 2010).Similar features are also apparent in endogenous non-retroviral virus-like elements in insect genomes.These observations imply that not only LINE-1-but also non-LINE-1-mediated integration has contributed to the production of endogenous virus-like elements, raising the possibility of a more general role for these elements in the evolution of viruses and their hosts.However, evidence for non-LINE-1-mediated integration of endogenous non-retroviral virus-like elements has not been documented.
The genome of the thirteen-lined ground squirrel (Ictidomys tridecemlineatus) contains an EBLN (itEBLN) element, which is highly similar to bornavirus N (77% amino acid sequence identity) (Belyi et al., 2010;Horie et al., 2010;Katzourakis and Gifford, 2010).In the phylogenetic analysis of bornavirus N and EBLN elements, itEBLN was located within the cluster of bornavirus N (Horie et al., 2010;Katzourakis and Gifford, 2010), although bootstrap support was not high (21%) (Horie et al., 2010).Low-stringency Southern blot hybridization of closely related species including woodchuck (Marmota monax) using itEBLN as a probe did not produce any positive band (Horie et al., 2010), and itEBLN is considered to have been generated after the divergence of thirteenlined ground squirrels and woodchucks at 8.5 MYA (Giboulet et al., 1997;Mercer and Roth, 2003;Obolenskaya et al., 2009;Horie et al., 2010).The presence of a poly(A) tail was consistent with LINE-1-mediated integration of itEBLN.However, it was also reported that LINE-1 activity was lost on the lineage of thirteen-lined ground squirrels 4-5 MYA (Platt II and Ray, 2012).To test the support for LINE-1-mediated integration of itEBLN, it may be helpful to estimate the integration time of itEBLN, which is expected to be within the interval of 4-8.5 MYA.The purpose of the present study was to conduct molecular evolutionary analyses to gain insights into the integration time of itEBLN.
Nucleotide sequences encoding amino acid sequences homologous to bornavirus N were retrieved from the International Nucleotide Sequence Database (INSD) with TBLASTN (version 2.2.29+) (Altschul et al., 1997) using N from strain ABVNL-001 (INSD accession number: FJ792853) (De Kloet and Dorrestein, 2009) as a query.As of December 27, 2013, 108 sequences with an e-value of 0.0 and three sequences with e-values of < e-100 were identified using the Nucleotide Collection and the Whole-Genome Shotgun Contigs as databases, respectively.The same set of sequences was obtained using strain No/ 98 (AJ311524) (Pleschka et al., 2001) as a query, whereas a subset of these sequences was obtained using strain VS-4707 (KF680099) (Rubbenstroth et al., 2014).These strains are representative of three major clusters of bornaviruses (see Fig. 1).After eliminating recombinant sequences (AY705791 and AY705792) (Schneider et al., 2005), 109 sequences were used for the phylogenetic analysis.They included 71 sequences for BDV N, 31 sequences for ABV N, and six sequences for itEBLN, in addition to one sequence for an EBLN element from cape golden mole (Chrysochloris asiatica) (caEBLN) (Supplementary Table S1).It should be noted that the six sequences for itEBLN were apparently derived from a single locus, because only a single sequence, to which they are identical or nearly identical, occurs in the genomic assembly of the thirteen-lined ground squirrel (SpeTri2.0).caEBLN has been found to be closely related to, but outside the cluster of, bornavirus N and itEBLN (Horie et al., 2013), and was therefore used as the outgroup for examining the phylogenetic relationship between them in the present study.Multiple alignment of the 109 amino acid sequences was made with MAFFT (version 6.901b) (Katoh et al., 2002), and 320 sites were found to be shared by all sequences.The best fitting model of amino acid substitution was selected by MEGA (version 5.2.2) (Tamura et al., 2011) as the JTT model with rate heterogeneity among sites (Γ shape parameter =   (Jones et al., 1992).A phylogenetic tree was constructed by the neighbor-joining method (Saitou and Nei, 1987), and the reliability of interior branches was assessed by computing the bootstrap probability with 1,000 resamplings (Felsenstein, 1985).
The phylogenetic tree of bornavirus N and itEBLN using caEBLN as the outgroup is presented in Fig. 1. itEBLN was located inside the cluster of bornavirus N, which was consistent with the results obtained in previous studies adopting different methods and outgroups (Horie et al., 2010;Katzourakis and Gifford, 2010).However, the bootstrap probability supporting the above relationship was not high (37%), as in the previous study (21%) (Horie et al., 2010).These results suggest that the integration time of itEBLN was close to the time of the most recent common ancestor (MRCA) for bornavirus N. It should be noted that the integration time of itEBLN cannot be estimated directly because the evolutionary rate of itEBLN is unknown.In contrast, the time of the MRCA for bornavirus N may be estimated from a comparison of bornavirus strains isolated at different time points (Nei, 1983).Insights into the integration time of itEBLN may therefore be gained by estimating the time of the MRCA for bornavirus N, taking into account the relationship between them observed above.
In general, the recombination rate of viruses belonging to the order Mononegavirales is considered to be negligible because of the formation of RNP, which inhibits the copychoice of RNA-dependent RNA polymerase (Conzelmann, 1998;Chare et al., 2003).Thus, the time of the MRCA for bornavirus N is considered to be equivalent to those for other proteins, and to minimize the estimation error, all proteins encoded by the bornavirus genome were used for estimating the time of the MRCA.The genomic sequences of six BDVs and 14 ABVs, for which the isolation years are known , were retrieved from the INSD (Supplementary Table S1).Multiple alignment of the 20 amino acid sequences was performed for each non-overlapping region of open reading frames, and all the alignments obtained were concatenated into one, as a result of which 2,732 sites were found to be shared by all sequences.The JTT model with rate heterogeneity among sites (Γ shape parameter = 0.46) was selected as the best fitting model.The time of the MRCA was estimated by BEAST (version 1.8.0) (Drummond et al., 2012), with several models assumed for rate heterogeneity among lineages (strict clock, lognormal relaxed clock, exponential relaxed clock, and random local clock) and population growth (constant size, exponential growth, logistic growth, expansion growth, and Bayesian skyline).A Markov chain Monte Carlo was run for 10,000,000 generations.After discarding the first 1,000,000 generations as the burn-in, parameter values were sampled every 1,000 generations to obtain posterior distributions.The parameter estimation was validated with the effective sample size of > 100 for each combination of models.
Estimates for the time of the MRCA for all bornavirus proteins, which is considered to be equivalent to that for N, are summarized in Table 1.The time of the MRCA is presented as years before the isolation of the latest strain (2011).For some combinations of models, the estimates were ~82 YA, which corresponded to the isolation year of the earliest strain (1929).Apparently these estimates are unrealistic and reflect an incompatibility of the combinations of models assumed.In the other cases, the estimates varied to some extent, with the median values ranging from 3,697 to 20,481 YA.Importantly, however, all the estimates were < 0.3 MYA even when the 95% highest posterior density (HPD) interval was taken into consideration.These results indicate that the time of the MRCA for bornavirus N was much later than the loss of LINE-1 activity on the lineage of thirteen-lined ground squirrels, which may have occurred 4-5 MYA (Platt II and Ray, 2012).Therefore, the integration time of itEBLN, which was considered to be close to the time of the MRCA for bornavirus N, is also likely to be much later than the a Median (lower-upper bound of 95% HPD interval).The time of the MRCA is described as years before the isolation of the latest strain (2011).
loss of LINE-1 activity.
The relatively recent integration of itEBLN compared to the loss of LINE-1 activity on the lineage of thirteenlined ground squirrels supports non-LINE-1-mediated integration of itEBLN.In considering the mechanism of non-LINE-1-mediated integration, it is noteworthy that recombination of exogenous RNA virus with ERV has been shown to promote the integration of non-retroviral virus-like elements in somatic cells (Klenerman et al., 1997;Geuking et al., 2009).In addition, in these cases, the integration of recombinants appeared to occur at sites of DNA damage through non-homologous end joining, and the retrotransposition event did not give rise to a direct repeat.To examine whether the itEBLN was the result of recombination involving ERV, the upstream and downstream 2,000 nucleotides of itEBLN were searched for the existence of repetitive DNA elements deposited in the REPBASE library using CENSOR (Kohany et al., 2006).Interestingly, two elements derived from ERV and one from SINE were observed in the upstream region, and one element derived from ERV in the downstream region (Fig. 2).In particular, the upstream ERV-and SINE-derived elements were located in close proximity to itEBLN.It has been reported that a retroposition of ERV occurred on the lineage of thirteen-lined ground squirrels 0.3 MYA (Squire and Andrews, 2003), suggesting that ERV may have been active at the time of itEBLN integration.It is conceivable that mRNA of bornavirus N recombined with RNA transcripts of two ERVs and a SINE; the recombinant may then have been reverse transcribed by ERV and integrated into the genomic DNA of germ line cells to give rise to itEBLN.In conclusion, there appear to be multiple mechanisms to generate endogenous virus-like elements in animal genomes, reinforcing the importance of these elements in the evolution of viruses and their hosts.

Fig. 1 .
Fig. 1.Phylogenetic tree of bornavirus N and itEBLN using caEBLN as the outgroup.Bootstrap probabilities are provided for interior branches.Black and white arrowheads indicate the time of the MRCA for bornavirus N and the time of integration of itEBLN, respectively.The scale bar indicates the number of amino acid substitutions per site.

Fig. 2 .
Fig.2.Schematic diagram of the genomic structure for the itEBLN locus and its upstream and downstream flanking 2,000 nucleotides, included in the Whole Genome Shotgun contig12043 (AGTP01124043).The class/ subclass (upper) and name (lower) of the homologous sequence in the REPBASE library are indicated for each boxed region, except for itEBLN.

Table 1 .
Estimates for the time of the MRCA for bornavirus N