Stress-responsive retrotransposable elements in conifers

Conifers are important in many forest ecosystems. They have a long generation time and are immobile; therefore, they require considerable plasticity to adapt to environmental stresses. Moreover, conifers have a large genome, a high proportion of which is occupied by repetitive elements. Retrotransposons are the most highly represented repetitive elements in conifers whose whole-genome sequences have been examined. These retrotransposons are usually silenced, to maintain genome integrity; however, some are activated by environmental stress. The insertion of retrotransposons into genic regions is associated with phenotypic and genetic diversity. The large number and high diversity of retrotransposons in conifer genomes suggest that they play a role in adaptation to the environment. In this review, progress in research on the roles of retrotransposons in the stress responses of conifers is reviewed, and potential future work is discussed.


INTRODUCTION
Conifer is a major clade of extant gymnosperm species that comprises approximately 630 species. This clade diverged from a sister clade of angiosperm species around 300 MYA. Conifers are divided into two main groups based on molecular phylogenetics: Conifers I (Pinaceae conifers) and Conifers II (non-Pinaceae conifers or cupressophytes) (Bowe et al., 2000;Ran et al., 2018). Most conifers are diploid and have large genomes (4-35 Gb). The large size of the genome is attributed to the expansion of repetitive elements (REs). Recent progress in the development of next-generation sequencing techniques has permitted the detailed analysis of wholegenome sequences. To date, whole-genome sequencing has been performed in several species of both Conifers I and II. A high proportion of REs was confirmed in the conifer genome of both groups, with REs representing 61-82% of the entire genome (Table 1). Transposable elements (TEs), especially long terminal repeat retrotransposons (LTR-RTs), constitute a substantial

Stress-responsive retrotransposable elements in conifers
Tokuko Ujino-Ihara * Ecological Genetics Laboratory, Department of Forest Molecular Genetics andBiotechnology, Forestry andForest Products Research Institute, Tsukuba, Ibaraki 305-8687, Japan (Received 29 March 2022, accepted 28 July 2022;J-STAGE Advance published date: 15 November 2022) Conifers are important in many forest ecosystems. They have a long generation time and are immobile; therefore, they require considerable plasticity to adapt to environmental stresses. Moreover, conifers have a large genome, a high proportion of which is occupied by repetitive elements. Retrotransposons are the most highly represented repetitive elements in conifers whose whole-genome sequences have been examined. These retrotransposons are usually silenced, to maintain genome integrity; however, some are activated by environmental stress. The insertion of retrotransposons into genic regions is associated with phenotypic and genetic diversity. The large number and high diversity of retrotransposons in conifer genomes suggest that they play a role in adaptation to the environment. In this review, progress in research on the roles of retrotransposons in the stress responses of conifers is reviewed, and potential future work is discussed.
Key words: conifer, retrotransposon, stress response component of the genomic REs. The two main superfamilies of LTR-RTs are Ty3/gypsy and Ty1/copia. In the conifers listed in Table 1, the Ty3/gypsy families are more prevalent than the Ty1/copia families. Phylogenetic analyses have shown that many species-specific Ty3/ gypsy and Ty1/copia families are expanded in the conifer genome, although there are also several highly conserved families (Nystedt et al., 2013;Stevens et al., 2016;Song et al., 2021;Xiong et al., 2021;Niu et al., 2022). Niu et al. (2022) reported that most TEs are probably methylated and transcriptionally silenced in Chinese pine (Pinus tabuliformis). However, evidence of the recent transposition of a Ty1/copia family PARTC element was detected as a recently inserted heterozygous element in sugar pine (P. lambertiana) (Stevens et al., 2016). In some angiosperm species, such as Arabidopsis thaliana, the proportion of the genome occupied by LTR-RTs is low, probably because of deletion by unequal recombination or illegitimate recombination (Cossu et al., 2017). A signature of the removal of LTR-RTs by unequal recombination is the production of solo-LTRs. Furthermore, unequal recombination is probably less abundant in conifers, because the ratio of perfect LTRs to solo-LTRs is relatively high in conifers (Cossu et al., 2017;Voronova et al., 2017). The ratio can also become high when LTR-RT transposition occurs frequently in conifers; however, the frequency of such transposition has not been well estimated.
Conifers are important in many forest ecosystems and are adapted to a range of environments in different parts of the world. LTR-RTs play roles in the adaptation to biotic and abiotic stresses in plant species (Grandbastien, 1998;Deneweth et al., 2022). The transcriptional activation and transposition of LTR-RTs caused by stress can alter the expression of genes in host genomes, as previously reported in angiosperm species (Ito et al., 2011;Butelli et al., 2012;Kashino-Fujii et al., 2018;Hsu et al., 2019). The high proportion of LTR-RTs in the conifer genome suggests that they contribute to environmental adaptation. Although a limited number of studies have addressed the role of LTR-RTs in the stress response in conifers, the findings reported to date are reviewed here, and future study directions are discussed.

TRANSCRIPTIONAL ACTIVATION OF LTR-RTs BY BIOTIC AND ABIOTIC STRESSES IN
Pinus sylvestris L.
The activation of LTR-RTs by stress in conifers was first reported in Scots pine (P. sylvestris L.) (Voronova et al., 2011). Since the whole-genome sequence of Scots pine was unavailable when that study was conducted, universal primers derived from the primer binding site (PBS) located downstream of the 5′ LTR were used to amplify the LTR-RT regions of the target species (Kalendar et al., 2010). Polymerase chain reaction (PCR) fragments amplified from cDNAs using PBS primers were compared in heat-stressed (40 °C for 16 h) and control seedlings. Both Ty3/gypsy-and Ty1/copiarelated transcripts were induced by heat stress. Despite the phylogenetic separation between gymnosperms and angiosperms, some of the transcribed RTs in conifers retained more than 80% sequence identity to those of angiosperms.
The transcription of LTR-RTs in seedlings by heat stress was further confirmed using ramets with identical genetic backgrounds (Voronova et al., 2014). In addition to heat stress, the activated LTR-RTs were surveyed for the following stimuli: pine woolly aphid infestation, abscisic acid treatment and salicylic acid treatment. Abscisic acid and salicylic acid are phytohormones that play major roles in the responses to drought and in the plant immune response, respectively. About 30 sequences with high identity to known LTR-RTs were detected, although they were partial and corresponded to different regions of LTR-RTs. The number of LTR-RTlike sequences identified was higher in conditions of heat stress and aphid infestation than in the case of treatment with phytohormones, and more Ty3/gypsy-like sequences were identified than Ty1/copia-like sequences. Some of the LTR-RT-like sequences were specifically amplified in the heated and infested samples, but some were amplified from samples in all stimuli. Several LTR-RT-like transcripts were shared between the infestation and salicylic acid treatment conditions. The authors speculated that because salicylic acid is involved in plant defense responses, those sequences may be induced by the same regulatory system via common cis-and trans-elements.
The response of LTR-RTs to two fungal pathogens was also tested using PCR primers specific to nine major RT families in Scots pine (Voronova, 2019). The conifers were inoculated with two fungal pathogens with different pathogenicity. Both pathogens induced the transcription of LTR-RT families; however, the magnitude and pattern of LTR-RT transcription during the spread of infection differed according to the host genotype and pathogen. The expression patterns of the nine tested LTR-RTs were highly correlated. The author argued that activation of LTR-RTs by fungal pathogens may be caused by changes in the host's global chromatin methylation state. The copy number of LTR-RTs did not correlate with their transcriptional level, because the transcriptional induction  Pellicer and Leitch, 2020), with the exception of Taxus yunnanensis, for which the value was estimated based on the k-mer distribution in the cited reference. b Proportion of RE against the whole genome. c The method used to calculate the proportion is described in the cited paper.
of IFG, the most frequent LTR-RT family in Scots pine, was comparable to that of other LTR-RTs. Although the differentially expressed LTR-RT-like sequences reported in these studies were imperfect LTR-RT sequences, this series of studies indicated that various LTR-RT families probably acquired stress responsiveness in Scots pine.

ACTIVATION OF LTR-RTs BY HEAT STRESS IN Cryptomeria japonica
The LTR-RT response to stress has been intensively studied in Scots pine; however, the expression of fulllength retrotransposons has not been reported in this species. Although the analysis of LTR-RTs by PCR in Scots pine has limitations due to their complexity, RNA-Seq using the next-generation sequencing technique has greatly facilitated the analysis of transcribed LTR-RTs. Recently, transcriptionally active complete LTR-RTs were identified in Japanese cedar (Cryptomeria japonica). Japanese cedar belongs to the Cupressaceae family in Conifers II and is known as "sugi" in Japan. The genome size of Japanese cedar is estimated to be 10.8 Gb (Hizume et al., 2001), and repetitive sequences are likely to contribute to the genome size, as observed in other conifers (Tamura et al., 2015).
Japanese cedar can adapt to a wide range of environmental conditions; thus, it is widely distributed throughout Japan, from northern to southern areas. Its widespread distribution suggests that the species has adapted to a wide temperature range. To explore the molecular mechanism underlying adaptation to heat in this species, a transcriptome analysis was conducted for seedlings that were subjected to a heat stress of 45 °C for 3 h (Ujino-Ihara, 2020). Half of the seedlings were heatacclimated (38 °C for 2 h on two consecutive days) before the application of the heat stress. This treatment causes physiological changes in the seedlings of Japanese cedar and probably improves their heat tolerance (Ujino-Ihara, unpublished data). LTR-RT-related sequences accounted for a high proportion of transcripts that were differentially expressed by the heat acclimation. Most LTR-RT-related sequences were highly upregulated by the heat stress in non-heat-acclimated seedlings, but not in heat-acclimated ones (Ujino-Ihara, 2020; Fig. 1). Although the numbers of Ty1/copia-and Ty3/gypsy-related sequences were comparable, several Ty3/gypsy-related sequences showed more prominent expression than did the other LTR-RTrelated sequences. These abundant LTR-RT sequences were partial or truncated, with the exception of a transcript named CJHS031206, which appeared to include a full-length Ty3/gypsy-family LTR-RT. CJHS031206 had triplicate copies of complete LTR-RTs with nearly identical interval sequences.
Using the amino acid sequence of the reverse transcriptase of representative LTR-RTs, the phylogenetic position of CJHS031206 was determined (Fig. 2). The following four LTR-RTs were also included: IFG7 of Monterey pine (P. radiata); PHRE1 and PHRE2 of Moso Bamboo (Phyllostachys edulis), which are actively transposed by heat (Papolu et al., 2021); and a well-known heat-activated   (Ito et al., 2011(Ito et al., , 2013Cao et al., 2015;Masuta et al., 2018). IFG7 is a member of the IFG family, which is abundant in conifer genomes and has an ancient origin. CJHS031206 is closest to IFG7 among the compared LTR-RTs and is a member of the Reina lineage. This LTR-RT group has chromodomains, which play roles in targeted integration into the genome (Novikova, 2009). Although CJHS031206, PHRE1 and PHRE2 belong to the Reina lineage and are activated by heat stress, they are phylogenetically distant from ONSEN. The activation of LTR-RTs in both superfamilies suggests that heat responsiveness was independently acquired in different LTR-RT lineages for phylogeneti-cally distant species. Voronova et al. (2017) investigated the distribution of IFG7 homologs in P. taeda among 80 gymnosperm species using specific primer pairs to amplify the polyprotein region. Amplification from IFG7-specific primers was prominent in the genomes of conifers distributed across extreme environments, such as the silver fir (Abies alba) and Macedonian pine (P. peuce) at high elevations and the Italian stone pine (P. pinea) with a very southern distribution in Syria and Libya. These authors argued that the expansion of IFG7 is associated with adaptation to extreme conditions in conifers. Although CJHS031206 has sequence similarity to IFG7, the divergence of IFG7   (Llorens et al., 2011), except for the sequences of Cereba (AY040832.1) and Opie-2 (U68408.1), which were retrieved from the NCBI nucleotide database. The sequences of PHRE1 and PHRE2 were obtained from Supplementary Table 1 of Papolu et al. (2021). The sequence of ONSEN (AT1G11265) was retrieved from The Arabidopsis Information Resource. Translated sequences of their ORFs were aligned using Clustal W (Thompson et al., 1994), and the RT regions of Cereba, Opie-2, PHRE1, PHRE2 and ONSEN were extracted based on the alignment. The alignment of the RT regions was constructed, and the bootstrap maximum likelihood tree was built, using MEGA11 (Tamura et al., 2021). in species of Conifers II was suggested in their analysis. The evolution of the IFG family and its link to the evolution of coniferous trees should be the targets of future studies.

TRANSCRIPTIONAL REGULATION OF LTR-RTs IN THE HEAT RESPONSE OF C. japonica
Ujino-Ihara (2020) detected putative heat shock factor (Hsf) binding sites within CJHS031206. The presence of these binding sites suggests that heat induction of CJHS031206 is caused by Hsf family proteins. HsfA2 homologs of C. japonica had similar expression patterns to those of CJHS031206 in heat-stressed seedlings with or without heat acclimation (Ujino-Ihara, 2020). HsfA2 is the main regulator of acquired thermotolerance in A. thaliana (Charng et al., 2007); moreover, it is required to activate ONSEN (Cavrak et al., 2014). Therefore, an HsfA2 homolog is a strong candidate transcription factor for inducing the transcription of CJHS031206. Transcription of either CJHS031206 or HsfA2 was almost absent at 24 h after heat stress (Ujino-Ihara, 2020; Fig.  1). This result was consistent with the previous finding that transcripts of HsfA2 were almost absent 24 h after heat shock in A. thaliana (Charng et al., 2007). The downregulation of HsfA2 may lead to the downregulation of CJHS031206. The transcription of small heat shock proteins regulated by HsfA2 was more prolonged than that of HsfA2 (Charng et al., 2007). This difference suggests that the repression mechanism differs for these protein-coding genes involved in heat acclimation and LTR-RTs. CJHS031206 transcription was repressed in heat-acclimated seedlings, even during heat stress (Ujino-Ihara, 2020; Fig. 1). Although the transcript level of CJHS031206 during heat acclimation was not analyzed in the study mentioned above, its transcription was probably upregulated in heat-acclimated seedlings. Excessive activation of LTR-RTs is detrimental to individuals; therefore, recently activated LTR-RTs may be silenced, at least for some period of time, by epigenetic mechanisms.

LTR-RT EXPRESSED IN DROUGHT RECOVERY IN Pinus halepensis MILLER
Another example of stress-responsive LTR-RTs was reported in the Aleppo pine (P. halepensis Miller) (Fox et al., 2017). During the recovery process after the suspension of irrigation for 46 days, the transcription of LTR-RTs was strongly induced in this species. The predominant LTR-RT was a Ty1/copia retrotransposon (referred to as Tnt1-94). The Tnt1 family is a wellknown active transposon found in tobacco. The insertion of Tnt1 preferentially occurs within, or close to, host gene coding sequences (Le et al., 2007). Therefore, the activation of Tnt1-94 may contribute to changes in gene expression in the recovery process from drought. Fox et al. (2017) also reported a large reduction in the expression of H3K9 methyltransferase when the Tnt1-94 transcript was abundant at the post-irrigation stage. In the epigenetic regulation of transcription, histone H3 lysine 9 methylation (H3K9me) promotes a repressed state, whereas histone H3 lysine 4 methylation (H3K4me) promotes an active state (Bhadouriya et al., 2020). A reduction in the expression of the lysine-specific histone demethylase 1 homolog 3, which reduces the levels of histone H3K4me, was also observed in the recovery process. Based on the expression patterns of these methylationrelated transcripts, Fox and colleagues hypothesized that epigenetic regulation is partially related to the activation of TEs in the Aleppo pine during the recovery process.

GENOME DISTRIBUTION OF LTR-RTs AND STRESS RESPONSE GENES
As described above, examples of the transcriptional activation of LTR-RTs by abiotic or biotic stress in conifers are accumulating. However, the transpositional activity of these LTR-RTs under stress conditions remains unclear. Few cases of recent transposition have been reported to date (Tamura et al., 2015;Stevens et al., 2016). Although direct evidence of changes in gene regulation in conifers has not been reported, the production of whole-genome sequences permitted a comprehensive analysis of the distribution of LTR-RTs in the genic regions of loblolly pine and sugar pine (Voronova et al., 2020). Those researchers constructed a conifer-specific TE-derived repeat database of 9,107 representative sequences acquired from PIER v.2.0 (Pine Interspersed Element Resource) (Neale et al., 2014). Using the repeat database, they investigated whether particular TE families were enriched in stress-responsive genes involved in similar biological processes. A miniature invertedrepeat TE, which they named Plater, was the most highly enriched TE in the gene-flanking regions, and two Ty1/copia family LTR-RTs were also frequent in those regions. However, the TE frequency in the gene-flanking regions was not high in the two species analyzed. TE insertions were more frequently detected in intronic regions than in promoter regions. Whereas no clear enriched biological function was detected among genes with insertions of identical TEs in their gene-flanking regions, genes with identical TE insertions within their introns were involved in defense and regulative response processes. Voronova et al. (2020) also focused on the function of genes with multiple TE insertions. Such genes have many Gene Ontology (GO) terms, and the authors speculated that they are involved in many cellular processes and are essential for the response to the environment.
As mentioned above, the expansion of the IFG family in conifers may be associated with adaptation to the environment. Voronova et al. (2020) also examined in detail the distribution of the IFG family in the genome. However, their involvement in stress responses was unclear in the two conifers analyzed. Therefore, further studies are required to elucidate the function of the IFG family using conifers that survive in extreme environments. In a recent report on the whole-genome sequence of coast redwood (Sequoia sempervirens), about 10,000 genes harbored Ty1/copia insertions (Neale et al., 2022). Some of these genes were associated with the phenylpropanoid and flavonoid biosynthesis pathways. Among the enriched GO biological processes for the genes, terms related to stress response, such as response to biotic stress, bacteria, temperature and heat, were frequently observed. The enrichment of these terms supports the positive role of LTR-RTs in the species.

FUTURE PERSPECTIVES
For perennial plant species, such as conifers, it is essential to adapt to the surrounding environment throughout their long lifespan. Evidence for the transcriptional activation of LTR-RTs by stress has been reported in conifers. Although the transposition of LTR-RTs in the conifer genome and its effect on gene expression has not been proven, the insertion of LTR-RTs into genic regions of somatic cells may contribute to the phenotypic variation within an individual. It may contribute to the rapid adaptation of individuals to fluctuations in the environment. Conversely, to adapt to sustained changes in the environment at the species level, the adaptive phenotypic variation caused by LTR-RT insertions should be passed on to the next generation. Favorable polymorphisms passed on to later generations will only contribute to the adaptation of the species as a whole if they are widespread in the population.
CJHS031206 is an LTR-RT that appears to be involved in adaptation to heat stress in Japanese cedar. It has Hsf binding elements in the LTR regions. Insertion of an LTR-RT carrying heat response elements may add to the heat responsiveness of the gene into which it is inserted and may thereby contribute to heat stress adaptation. Japanese cedar is widely distributed in Japan; thus, quantifying the copy number of CJHS031206 in trees from different geographical origins should also provide clues regarding its biological significance. For example, the copy number of ONSEN is correlated with the annual temperature range in A. thaliana (Quadrana et al., 2016). One advantage of Japanese cedar is that clonal propagation of this species is more straightforward than that of other conifers. Since conifers are outcrossing species, producing genetically identical individuals is usually difficult. Using clonal materials with identical genome sequences, the presence of new insertions and copy number variations of LTR-RTs under different environmental stresses can be evaluated in Japanese cedar. The association of LTR-RT insertion polymorphisms with specific environmental factors will lead to an assessment of the phenotypic benefit caused by such insertions.
The activation of various LTR-RTs under the same stress as that observed in Scots pine suggests that both specific and global regulation are at play for their transcriptional activation in conifers. The whole-genome sequence of additional conifers, including C. japonica, will be available shortly. Increasing genome information will also contribute to the elucidation of the role of LTR-RTs in the environmental adaptation of conifers.