Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Identification of aberrant transcription termination at specific gene loci with DNA hypomethylated transcription termination sites caused by DNA methyltransferase deficiency
Masaki ShiraiTakuya NaraHaruko TakahashiKazuya TakayamaYuan ChenYudai HiroseMasashi FujiiAkinori AwazuNobuyoshi ShimodaYutaka Kikuchi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2022 Volume 97 Issue 3 Pages 139-152

Details
ABSTRACT

CpG methylation of genomic DNA is a well-known repressive epigenetic marker in eukaryotic transcription, and DNA methylation of promoter regions is correlated with gene silencing. In contrast to the promoter regions, the function of DNA methylation during transcription termination remains to be elucidated. A recent study revealed that mouse DNA methyltransferase 3a (Dnmt3a) mainly functions in de novo methylation in the promoter and gene body regions, including transcription termination sites (TTSs), during development. To investigate the relationship between DNA methylation overlapping the TTSs and transcription termination, we performed bioinformatics analysis using six pre-existing Dnmt-/- mouse cell datasets: four types of neurons (three Dnmt3a-/- and one Dnmt1-/- mutants) and two types of embryonic fibroblasts (MEFs) (Dnmt3a-/- and Dnmt3b-/- mutants). Combined analyses using methylome and transcriptome data revealed that read counts downstream of hypomethylated TTSs were increased in three types of neurons (two Dnmt3a-/- and one Dnmt1-/- mutants). Among these, an increase in chimeric transcripts downstream of the TTSs was observed in Dnmt3a-/- mature olfactory sensory neurons and Dnmt3a-/- agouti-related peptide (protein)-producing neurons, thereby indicating that read-through occurs in hypomethylated TTSs at specific gene loci in these two mutants. Conversely, in Dnmt3a-/- MEFs, we detected reductions in read counts downstream of hypomethylated TTSs. These results indicate that the hypomethylation of TTSs can both positively and negatively regulate transcription termination, dependent on Dnmt and cell types. This study is the first to identify the aberrant termination of transcription at specific gene loci with DNA hypomethylated TTSs attributable to Dnmt deficiency.

INTRODUCTION

Transcription by RNA polymerase II (pol II), the synthesis of mRNA from DNA, involves three biological processes: initiation, elongation and termination. In higher eukaryotes, two transcription processes (initiation and elongation) and their epigenetic regulation, DNA methylation and histone modifications, have been extensively studied (Lee and Young, 2000; Jones, 2012; Gates et al., 2017; Chen et al., 2018). In transcription termination, it is also well known that most protein-coding genes in eukaryotes possess a polyadenylation signal (PAS) located near the end of the 3′-UTR, and a cleavage and polyadenylation (CPA) complex is assembled around the PAS sequence of the nascent pre-mRNA, which directs pol II transcription termination (Proudfoot, 2016; Eaton and West, 2020). However, in contrast to transcription initiation and elongation, epigenetic regulation of transcription termination has remained largely unstudied.

In eukaryotes, ranging from plants to humans, DNA methylation is well characterized as a repressive epigenetic regulatory mark and mainly occurs at CpG dinucleotides (Li and Zhang, 2014; Schübeler, 2015; Luo et al., 2018). Advances in sequencing technology have resulted in genomic methylation profiling at single-base resolution, revealing that global DNA methylation (5-methylcytosine, 5mC) occurs in mammals, except for short CpG-rich regions known as CpG islands (Li and Zhang, 2014; Schübeler, 2015; Luo et al., 2018). Studies of methylome profiling have shown that DNA methylation patterns dynamically change during development and disease and that these patterns are linked to gene expression in mammals (Smith and Meissner, 2013; Greenberg and Bourc’his, 2019). Genome-wide methylome analysis revealed that DNA methylation levels around the promoter regions, including the transcription start site (TSS), were negatively correlated with gene expression (Ball et al., 2009; Rauch et al., 2009). Conversely, DNA methylation levels in gene bodies were positively correlated with gene expression (Kulis et al., 2013; Teissandier and Bourc’his, 2017). Although two possible roles, regulation of splicing and prevention of cryptic promoters, have been proposed for methylation in the gene body (Kulis et al., 2013; Teissandier and Bourc’his, 2017), the significance of DNA methylation around transcription termination sites (TTSs; cleavage/polyadenylation sites) remains unresolved.

DNA methylation patterns are well known to be established by DNA methyltransferases (Dnmts), which are categorized into two types in mammals: maintenance Dnmt (Dnmt1) and de novo Dnmts (Dnmt3a and Dnmt3b) (Li et al., 1992; Okano et al., 1999; Li and Zhang, 2014). Among these, Dnmt1 functions in the maintenance of methylation patterns during DNA replication, and loss of genome-wide methylation has been reported in Dnmt1-deficient embryonic stem cells (ESCs) (Li and Zhang, 2014). Moreover, both Dnmt3a and Dnmt3b knockout mice, which die at embryonic day 9.5, fail to start de novo methylation after implantation (Okano et al., 1999), and genetic mutation of Dnmt3a or Dnmt3b leads to various human diseases (Greenberg and Bourc’his, 2019). This suggests a critical role for DNA methylation in development and homeostasis in mammals through the regulation of gene expression. Recent studies using Dnmt3a-/- or Dnmt3b-/- ESCs have revealed that exon methylation is greatly reduced in Dnmt3a-/- ESCs compared to that in Dnmt3b-/- ESCs, even though Dnmt3a is less enriched in exons than Dnmt3b (Gu et al., 2018). In addition, a very recent study using mouse embryonic fibroblasts (MEFs) with genetic ablation of de novo Dnmts reported that Dnmt3a functions in de novo methylation at both TSS regions (TSS ± 1,000 bp) and gene bodies (from TSS regions to TTS) of the Polycomb group target developmental genes, whereas Dnmt3b mainly functions in X-chromosome methylation (Yagi et al., 2020). These results indicate that although the enzymatic functions of both proteins are redundant, the genomic target regions are distinct in mammals. Although mammalian Dnmt3a mutant phenotypes and target regions have been reported, the role of methylation by Dnmt3a during transcriptional termination has not yet been reported. In the present study, we sought to determine the functional role of Dnmt3a-mediated hypomethylation at TTSs based on comparative analyses of Dnmt1 and Dnmt3b using publicly available methylome and transcriptome datasets.

RESULTS

Genomic characteristics of hypomethylated differentially methylated regions (HypoDMRs) overlapping the TTS in Dnmt-deficient mouse cells

To investigate the functional association between DNA methylation and transcription termination, we searched datasets of Dnmt3a-deficient mammalian cells that contained both methylome (whole-genome bisulfite sequencing (WGBS) and methylated DNA immunoprecipitation sequencing (MeDIP-Seq)) and transcriptome (RNA-seq) data. Based on dataset quality, we selected three mouse Dnmt3a-/- neuron datasets (mature olfactory sensory neurons [mOSNs] (Colquitt et al., 2014); agouti-related peptide (protein)-producing neurons [AgRPNs] (MacKay et al., 2019); and mature GABAergic inhibitory neurons [mGINs] (Lavery et al., 2020)), and a single Dnmt3a-/- MEF dataset (Yagi et al., 2020). To analyze the effect of hypomethylated TTSs on transcription termination, we used Dnmt1-/- mature parvalbumin-positive cortical interneurons [mPCIs] (Pensold et al., 2020) and Dnmt3b-/- MEFs (Yagi et al., 2020) as controls for Dnmt3a-/- neurons and Dnmt3a-/- MEFs, respectively. The following are brief descriptions of the four different types of mouse neurons (mOSNs, AgRPNs and mGINs for functional analysis of Dnmt3a-mediated hypomethylation; mPCIs for comparative analysis): mOSNs, sensory neurons located in the olfactory epithelium that transmit odorant information to the central nervous system; AgRPNs, GABAergic inhibitory neurons derived from the hypothalamic arcuate nucleus that produce the neuropeptide agouti-related peptide (protein); mGINs, GABAergic inhibitory neurons derived from the striatum; and mPCIs, GABAergic inhibitory neurons whose abundance is highest in the cerebral cortex.

We determined HypoDMRs overlapping TTSs, TSSs and gene bodies in the four types of mouse neurons (Dnmt1-/- mPCIs, Dnmt3a-/- mOSNs, Dnmt3a-/- AgRPNs and Dnmt3a-/- mGINs) (Fig. 1 and Supplementary Fig. S1) and two types of MEFs (Dnmt3a-/- and Dnmt3b-/- MEFs) (Fig. 2 and Supplementary Fig. S2). With respect to neurons, the median values of HypoDMRs in the whole genome and overlapping TTSs of the Dnmt1-/- mPCIs were approximately 2-fold lower than those in the Dnmt3a-/- mutants (mOSNs, AgRPNs and mGINs) (Figs. 1A and 1B). In contrast, the median value of HypoDMRs in TTSs was comparable (within approximately 1.25-fold) to that of the TSSs or gene bodies in each type of mutant neuron (Figs. 1C–1F). With respect to MEFs, the median values of HypoDMRs in the whole genome and TTSs of the Dnmt3a-/- mutants were similar to those of the Dnmt3b-/- mutants (Figs. 2A and 2B), whereas the median value of HypoDMRs in the TSSs was slightly lower (approximately 1.4-fold) than that in TTSs or gene bodies (Figs. 2C and 2D). In all of these Dnmt-/- mutants, TTSs, TSSs and gene bodies were hypomethylated, thereby indicating that these mutants could be used to investigate the role of methylation in TTSs. Histograms revealed that HypoDMRs overlapping the TTSs were almost symmetrically distributed across TTSs in the Dnmt1-/- mPCIs, Dnmt3a-/- mOSNs, Dnmt3a-/- AgRPNs, Dnmt3a-/- mGINs and Dnmt3a-/- as well as Dnmt3b-/- MEFs (Supplementary Figs. S3 and S4).

Fig. 1.

Characterization of HypoDMRs in four types of Dnmt-deficient mouse neurons. (A, B) Box plots showing log2-fold changes (mutant/WT) in DNA methylation levels of the HypoDMRs and HypoDMRs overlapping the TTSs (mPCIs, mOSNs, AgRPNs and mGINs). (C–F) Box plots showing log2-fold changes (mutant/WT) in DNA methylation levels of the HypoDMRs overlapping the TTSs, TSSs and gene bodies in mPCIs, mOSNs, AgRPNs and mGINs. In (A–F), “n” is the number of HypoDMRs and “med” is the median value.

Fig. 2.

Characterization of HypoDMRs in two types of MEFs. (A, B) Box plots showing log2-fold changes (mutant/WT) in DNA methylation levels of the HypoDMRs and HypoDMRs overlapping the TTSs (MEFs). (C, D) Box plots showing log2-fold changes (mutant/WT) in DNA methylation levels of the HypoDMRs overlapping the TTSs, TSSs and gene bodies in MEFs. In (A–D), “n” is the number of HypoDMRs and “med” is the median value.

Read counts downstream of TTSs

To explore the effect of hypomethylation overlapping the TTSs on transcription termination, we analyzed RNA-seq data obtained for four different types of Dnmt1-/-/Dnmt3a-/- neurons and two Dnmt3a-/-/Dnmt3b-/- MEFs. Read counts obtained for genes containing HypoDMRs overlapping the TTSs in wild-type (WT) and Dnmt-/- mutants were normalized in terms of TPM 3 kb [TTS/exonic region] (see Materials and Methods). A flowchart of our transcript analysis and the number of transcripts at each analysis step in the analysis of Dnmt-/- mutants are shown in Fig. 3 and Supplementary Tables S1–S8. It should be noted that some of the transcript isoforms share TTSs in our analysis. With the exception of Dnmt3a-/- mGINs, we detected an increase in the number of reads downstream of the hypomethylated TTSs in Dnmt3a-/- neurons (Fig. 4). We also found increased read counts downstream of TTSs in Dnmt1-/- mPCIs, suggesting that the level of methylation at TTSs is associated with aberrant transcription termination. The NADH:ubiquinone oxidoreductase core subunit S7 (Ndufs7) gene is shown in Fig. 5 as an example of the increased number of read counts downstream of the TTS in the Dnmt1-/- mPCIs. Conversely, we detected a significant reduction in the number of read counts downstream of hypomethylated TTSs in Dnmt3a-/- MEFs, compared to those in WT MEFs, although this was not observed in Dnmt3b-/- MEFs (Fig. 6). Collectively, the results obtained using Dnmt-/- mutants indicate that hypomethylation of TTSs has both positive and negative effects on the termination of transcription. We further examined transcription termination in hypomethylated TTSs in WT cells and found results similar to the aberrant transcription termination observed in Dnmt-/- mutant cells (Supplementary Fig. S5). These results support our hypothesis that hypomethylation of TTSs is the cause of aberrant transcription termination.

Fig. 3.

Schematic diagram of transcription termination data analysis. (A) A schematic illustration of HypoDMRs overlapping the TTSs in cells. Expressed transcripts were identified and analyzed when the expression levels were higher than the 1st quantile and the genomic region downstream of the TTSs was at least 3 kb. TTS Xn: the number of TTSs within each DMR. (B) Tabulated data numbers.

Fig. 4.

Metagene analysis of transcripts downstream of TTSs in four types of mutant neurons. (A, C, E and G) Box plots showing the normalized read counts in WT and Dnmt-/- mutants (mPCIs, mOSNs, AgRPNs and mGINs). Graphs on the right show enlarged views of boxed sections in the corresponding figures on the left. Expression levels were calculated using the following equation: TPM 3 kb [TTS] = number of reads aligned in 3 kb downstream of the TTS of each transcript × 106/total number of reads aligned in 3 kb downstream of the TTS. In addition, TPM 3 kb [TTS] scores were normalized using the following equation to consider the effect of the exonic region expression level: TPM 3 kb [TTS/exonic region] = TPM 3 kb [TTS] of each transcript/TPM of the exonic region of each transcript. (B, D, F and H) Pie charts depicting the numbers and percentages of transcripts with increased read counts (red), reduced read counts (blue) and unchanged read counts (white) in Dnmt-/- mutants (mPCIs, mOSNs, AgRPNs and mGINs).

Fig. 5.

Transcription termination defects in the Ndufs7 gene of Dnmt1-/- mPCIs. Genome browser views of the Ndufs7 gene in Dnmt1-/- mPCIs. Normalized read counts are shown as an example. Green dotted box and arrows represent the hypoDMR and increased read counts in Dnmt1-/- mPCIs, respectively.

Fig. 6.

Metagene analysis of transcripts downstream of TTSs in two types of MEFs. (A, C) Box plots showing the normalized read counts in WT and mutants (MEFs). The graphs on the right show enlarged views of boxed sections in the corresponding figures on the left. Expression levels were calculated using the following equation: TPM 3 kb [TTS] = number of reads aligned in 3 kb downstream of the TTS of each transcript × 106/total number of reads aligned in 3 kb downstream of the TTS. In addition, TPM 3 kb [TTS] scores were normalized using the following equation to consider the effect of the exonic region expression level: TPM 3 kb [TTS/exonic region] = TPM 3 kb [TTS] of each transcript/TPM of the exonic region of each transcript. (B, D) Pie charts depicting the numbers and percentages of transcripts with increased read counts (red), decreased read counts (blue) and unchanged read counts (white) in mutants (MEFs).

Chimeric transcripts increase in the HypoDMRs overlapping TTSs in Dnmt3a-/- mOSNs and Dnmt3a-/- AgRPNs

We subsequently analyzed the content of transcripts downstream of hypomethylated TTSs in the Dnmt1-/- mPCIs, as well as Dnmt3a-/- mOSNs and AgRPNs, and found that compared with WT cells, the frequency of chimeric transcripts was significantly increased in the Dnmt3a-/- mOSNs (approximately 7.4%) and AgRPNs (approximately 42.3%), although not in Dnmt1-/- mPCIs (Fig. 7). The Creatine kinase, brain (Ckb) and Ring finger protein 157 (Rnf157) genes are shown in Figs. 8 and 9, respectively, as examples of increases in the number of chimeric transcripts downstream of TTSs in Dnmt3a-/- mOSNs and AgRPNs. For these two genes, compared with those in WT neurons, chimeric transcripts were aberrantly fused with a downstream gene and/or intergenic DNA in Dnmt3a-/- mOSNs and AgRPNs (Figs. 8 and 9). In summary, the increase in chimeric transcripts detected in Dnmt3a-/- mOSNs and AgRPNs clearly indicates the occurrence of aberrant transcription termination in these mutants and that read-through transcription occurs in hypomethylated TTSs at specific gene loci.

Fig. 7.

Metagene analysis of chimeric transcripts. (A, C and E) Box plots showing the normalized read counts in WT and mutants (mPCIs, mOSNs and AgRPNs). The graph on the right in (C) shows an enlarged view of the boxed section in the figure on the left. (B, D and F) Pie charts depicting the numbers and percentages of transcripts with increased chimeric transcripts (red), decreased chimeric transcripts (blue) and unchanged chimeric transcripts (white) in mutants (mPCIs, mOSNs and AgRPNs).

Fig. 8.

Transcription termination defects in the Ckb gene of Dnmt3a-/- mOSNs. (A, B) Genome browser views of the Ckb gene in Dnmt3a-/- mOSNs. Normalized read counts (A) and chimeric transcripts (B), as examples, are shown. Green dotted boxes in (A and B) and arrows in (A) represent the hypoDMR and increased read counts in Dnmt3a-/- mOSNs, respectively.

Fig. 9.

Transcription termination defects in the Rnf157 gene of Dnmt3a-/- AgRPNs. (A, B) Genome browser views of Rnf157 gene in Dnmt3a-/- AgRPNs. Normalized read counts (A) and chimeric transcripts (B), as examples, are shown. Green dotted boxes in (A and B) and arrows in (A) represent the hypoDMR and increased read counts in Dnmt3a-/- AgRPNs, respectively.

DISCUSSION

Unlike transcription initiation and elongation, the function of DNA methylation during transcriptional termination remains unclear. To investigate the functional role of DNA methylation overlapping TTSs, we analyzed the effects of hypomethylation on transcription termination in Dnmt-deficient mouse cells. Our observations revealed higher read counts downstream of TTSs, not only in Dnmt3a-/- neurons (Dnmt3a-/- mOSNs and Dnmt3a-/- AgRPNs), but also in the Dnmt1-/- mPCIs used as controls (Fig. 4). In contrast, decreased read counts downstream of hypomethylated TTSs were observed in Dnmt3a-/- MEFs but not in Dnmt3b-/- MEFs (Fig. 6). Furthermore, analysis of downstream transcripts in hypomethylated TTSs in WT cells of mPCIs, mOSNs, AgRPNs and MEFs showed the same results as observed in Dnmt3a-/- or Dnmt1-/- mutants. This suggests that aberrant transcription termination is dependent on decreased methylation levels of TTSs in these mutant and WT cells. Because we cannot rule out the possibility that our results are influenced by the data analysis method, it is necessary to conduct an experimental analysis in the near future.

Although chimeric transcripts increased in both Dnmt3a-/- mOSNs and Dnmt3a-/- AgRPNs, we were unable to detect any such increase in Dnmt1-/- mPCIs (Fig. 7), even though the level of methylation at TTSs in the latter mutant was approximately half that in the Dnmt3a-/- mutants (Figs. 1A and 1B). There are two plausible explanations that we believe could account for these observations. First, the levels of methylation at TTSs may be independent of the formation of chimeric transcripts; second, read-through transcripts may be shorter in the Dnmt1-/- mPCIs than in the Dnmt3a-/- mutants.

We detected significant increases and reductions in the number of read counts downstream of hypomethylated TTSs in the three types of Dnmt-/- neurons (Dnmt1-/- mPCIs, and Dnmt3a-/- mOSNs and AgRPNs) and Dnmt3a-/- MEFs, respectively (Figs. 4 and 6), which indicates that there are two opposing effects on downstream transcription. In the case of heightened levels of transcription downstream of TTSs, it is considered that the termination of transcription at the TTSs is suppressed, and that the levels of transcription are increased because of read-through. Consistent with this hypothesis, we also identified an increase in chimeric transcripts downstream of TTSs in the Dnmt3a-/- mOSNs and AgRPNs (Fig. 7). Conversely, under circumstances in which there is a reduction in the number of reads downstream of TTSs, it is conceivable that transcriptional elongation is suppressed by the hypomethylation of TTSs. These findings from previous studies have indicated that DNA methylation in gene bodies, together with the histone modification H3K36me3, is associated with transcriptional elongation (Wagner and Carpenter, 2012; Teissandier and Bourc’his, 2017), and it is predicted that transcription is more likely to be terminated in response to hypomethylation at the TTSs. In the case of mGINs, we observed both increases and reductions in the number of downstream read counts at approximately 50% of the assessed genomic loci (Fig. 4H), thereby indicating that the positive and negative effects of gene body DNA methylation on transcription elongation function antagonistically. Collectively, our results indicate that although methylation of TTSs is an important factor contributing to transcription termination, it is not sufficient to control transcription termination.

Thus, it would be of particular interest to identify factors other than hypomethylation of TTSs that play important roles in regulating transcription termination. In this study, we used three GABAergic inhibitory neuron mutants (Dnmt3a-/- AgRPNs, Dnmt3a-/- mGINs and Dnmt1-/- mPCIs) in our analyses. Among these, we detected no aberrant termination of transcription in the Dnmt3a-/- mGINs, despite the hypomethylation of TTSs (Fig. 4). In contrast to these Dnmt3a-/- mGINs, we observed aberrant transcription termination in Dnmt3a-/- AgRPNs, a different type of GABAergic inhibitory neuron (Fig. 4), implying that the type of neuron is not a key factor. Furthermore, given our observations of reduced read counts in Dnmt3a-/- MEFs (Fig. 6), we speculated that the aberrant transcription termination is not a neuron-specific phenomenon. In this regard, recent studies have revealed that read-through similar to that observed in the present investigation occurs in cancer cells under inflammatory conditions as well as in cells that have been subjected to stress (Grosso et al., 2015; Proudfoot, 2016; Eaton and West, 2020). Based on the findings of our present bioinformatics analysis and those obtained in previous studies, we propose a model in which, in addition to DNA methylation, cell condition-specific factors, indicating the states of differentiation, proliferation and stress, play essential roles in regulating the termination of transcription (Fig. 10). This is because our results indicate that the hypomethylation of TTSs can both positively and negatively regulate transcription termination, dependent on Dnmt and cell types. With respect to chimeric RNAs, we hypothesize that normal methylation of the TTS regions suppresses the production of chimeric RNAs because it is difficult to generate long transcripts produced by read-through.

Fig. 10.

Proposed model of transcription termination.

Although we detected aberrant transcription termination at specific gene loci, characterized by hypomethylated TTSs associated with Dnmt deficiency, it is yet to be ascertained exactly how hypomethylation regulates transcription termination. Despite the relative lack of research on transcription termination, a unified allosteric/torpedo model has recently been proposed for the PAS-dependent termination (Eaton et al., 2020; Eaton and West, 2020). In this model, the CPA complex recognizes and cleaves the nascent transcript when pol II passes through the PAS; thus, the PAS is close to the TTS (10–35 bp upstream of the TTS) (Tian and Manley, 2017; Nourse et al., 2020). This cleavage by the CPA complex results in pol II pausing through dephosphorylation of the Spt5 elongation factor by a protein phosphatase 1 nuclear targeting subunit (Eaton and West, 2020). As paused pol II is dissociated from the DNA strand by 5′-3′ exoribonuclease 2, which degrades RNA in the 5′→3′ direction, the pol II pausing is thought to be a critical step for transcription termination (Proudfoot, 2016; Eaton and West, 2020). Although the CPA complex mainly binds to the nascent pre-mRNA and C-terminal domain of pol II, it is possible that DNA hypomethylation of the TTS or PAS genomic regions directly or indirectly suppresses or promotes CPA function, leading to changes in the state of pol II pausing. In PAS-dependent termination, there is another mechanism of pol II pausing referred to as heterochromatin-dependent pausing (Proudfoot, 2016). It has been reported previously that the heterochromatin structure downstream of the TTS causes pol II pausing by acting as an obstacle (Skourti-Stathaki et al., 2014; Proudfoot, 2016). Therefore, hypomethylation in the TTS region may lead to suppression or promotion of heterochromatin formation downstream of the TTS. Thus, it is conceivable that alternative transcription termination machinery functions in the termination of transcription by interacting with DNA methylation and cell condition-specific factors identified in the present and previous studies (Fig. 10).

Despite recent advances in our understanding of transcriptional termination, further molecular biological analyses examining hypomethylation of the TTS region are needed to gain more detailed insights into the molecular events underlying the processes whereby transcription is terminated.

MATERIALS AND METHODS

WGBS data analysis

The publicly available datasets used in this study were obtained from the Gene Expression Omnibus (GEO), and the accession numbers are GSE111172 (MEFs) (Yagi et al., 2020), GSE122405 (AgRPNs) (MacKay et al., 2019) and GSE124009 (mGINs) (Lavery et al., 2020). The FASTQ-format read sequences were trimmed of adaptors and low-quality bases were filtered using Trim Galore (version 0.6.4) (https://github.com/FelixKrueger/TrimGalore) with Cutadapt (version 2.6 or 2.10) (Martin, 2011) and FastQC (version 0.11.8 or 0.11.9) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). After trimming, the reads were aligned to the mouse genome (mm10) using Bismark (version 0.20.0) (Krueger and Andrews, 2011) and Bowtie (version 1.0.0) (Langmead et al., 2009). Read alignment was followed by the removal of PCR duplicates using the deduplicate_bismark (version 0.20.0) (Krueger and Andrews, 2011), and methylation calls were performed using bismark_methylation_extractor (version 0.20.0) (Krueger and Andrews, 2011). The reads from samples with multiple sequencing lanes were merged using SAMtools (version 1.9) (Li et al., 2009) after removing the PCR duplicates. The methylation ratios of CpG sites with a coverage of over five reads in WT, Dnmt3a-/- and Dnmt3b-/- were calculated from the methylation call data. The methylation ratio was visualized using the Integrative Genomics Viewer (IGV) (version 2.7.2) (Robinson et al., 2011).

MeDIP-Seq data analysis

The publicly available datasets used in this study were obtained from the GEO database, and the accession numbers are GSE145026 (6-month-old mice data) (mPCIs) (Pensold et al., 2020) and GSE52464 (mOSNs) (MacKay et al., 2019). FASTQ-format read sequences were trimmed of adaptors and low-quality bases were filtered using Trim Galore (version 0.6.4) (https://github.com/FelixKrueger/TrimGalore) with Cutadapt (version 2.6 or 2.10) (Martin, 2011) and FastQC (version 0.11.8 or 0.11.9) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). After trimming, the reads were aligned to the mouse genome (mm10) using Bowtie2 (version 2.3.5.1) (Krueger and Andrews, 2011). Aligned files (.bam) were sorted and indexed using SAMtools (version 1.9) (Li et al., 2009), followed by the removal of PCR duplicates using Picard MarkDuplicates (version 2.18.29) (Krueger and Andrews, 2011), with the subsequent merging of multiple sequencing lanes using SAMtools (version 1.9) (Li et al., 2009). Read coverage with 1 bin size was normalized by counts per million (CPM) mapped reads using the bamCoverage tool from DeepTools (version 3.3.2) (Ramírez et al., 2016) and visualized using the IGV (version 2.7.2) (Robinson et al., 2011).

Identification of differentially methylated regions (DMRs)

The methylation call data of WGBS were used to identify DMRs using swDMR (version 1.0.0) (Wang et al., 2015) with the following settings: cytosine type, CG; window, 500; step size, 50; points (lowest number of selected cytosine types in the window), 3; coverage, 5; fold (methylation level difference), 1.5; diff (value of max-min methylation level), 0.1; p-value, 0.05; and fdr, 0.05. The samples were compared using the Fisher’s test.

Aligned reads without MeDIP-Seq PCR duplicates were used to identify DMRs using the MEDIPS package of R software (version 1.38.0) (Wang et al., 2015) with the settings discussed here. MEDIPS.createSet function was applied with the following settings: uniq, 0; extend, 300; shift, 0; and window size = 700. MEDIPS.meth function was applied with the following settings: p.adj, BH; diff.method, edgeR; MeDIP, F; CNV, F; and minRowSum, 10. MEDIPS.selectSig function was applied with the following settings: p value, 0.05; adj, T; ratio, NULL; bg.counts, NULL; and CNV, F.

HypoDMRs were identified from the DMRs data and the genomic positions of HypoDMRs overlapping with TTSs, TSSs and gene bodies were determined using bedtools intersect (version 2.28.0) (Quinlan and Hall, 2010). HypoDMRs that overlapped with TTSs, TSSs and gene bodies were included in the HypoDMRs that overlapped with TTSs. HypoDMRs that overlapped with both TSSs and gene bodies were included in the HypoDMRs that overlapped with TSSs. Box plots, heat maps and density plots were generated using Python and R scripts. Individual methylation data for HypoDMRs overlapping TTSs are shown in Supplementary Table S1. Log2-fold changes (mutant/WT) in WGBS data were calculated from the DNA methylation ratio of swDMR outputs. To avoid division by zero, a value of 10−5 was added when calculating the fold changes. Log2-fold changes (mutant/WT) of MeDIP-Seq data were used to obtain edgeR.logFC scores of MEDIPS outputs.

RNA-seq data analysis

Publicly available datasets of mice deposited in the GEO database were used in this study. The accession numbers are GSE84165 (MEFs) (Yagi et al., 2017, 2019), GSE111172 (MEFs) (Yagi et al., 2020), GSE145026 (6-month-old mice data) (mPCIs) (Pensold et al., 2020), GSE52464 (mOSNs) (MacKay et al., 2019), GSE122405 (AgRPNs) (MacKay et al., 2019) and GSE123941 (mGINs) (Lavery et al., 2020). FASTQ files from the samples with multiple sequencing lanes were merged. The FASTQ-format sequence reads were trimmed of adaptors and low-quality bases were filtered using Trim Galore (version 0.6.4) (https://github.com/FelixKrueger/TrimGalore) with Cutadapt (version 1.18, 2.6 or 2.10) (Martin, 2011) and FastQC (version 0.11.8 or 0.11.9) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). After trimming, the reads were aligned to the mouse genome (mm10) using HISAT2 (version 2.1.0) (Kim et al., 2015). Aligned files (.bam) were sorted and indexed using SAMtools (version 1.9) (Li et al., 2009). Read coverage was normalized with 1 bin size by CPM mapped reads using the bamCoverage tool from DeepTools (version 3.3.2) (Ramírez et al., 2016) and visualized using the IGV (version 2.7.2) (Robinson et al., 2011). Splice junction data of the chimeric transcripts in the aligned files were inspected manually using the IGV (version 2.7.2) (Robinson et al., 2011).

Calculation of expression levels downstream of the TTS from RNA-seq data

Expression levels were quantified using StringTie (version 2.1.2) (Pertea et al., 2015) with NCBI RefSeq annotation, with the expression values of each transcript normalized in terms of transcripts per million (TPM). Transcripts expressed in WT and Dnmt-/- mutant cells were extracted from all transcripts (TPM > 0), and transcripts expressed in both strains at levels higher than the 1st quantile were extracted to delimit the transcriptionally active transcripts. In addition, transcripts with overlapping HypoDMRs across the TTS were used for subsequent analyses.

All reads that overlapped with the exonic or intronic sequences in the aligned files (.bam) were removed using bedtools intersect (version 2.28.0) (Quinlan and Hall, 2010) and the NCBI RefSeq annotation. In addition, the number of intergenic reads within 3 kb downstream of the TTS was determined using bedtools multicov (version 2.28.0) (Quinlan and Hall, 2010), with the exception of cases in which the genomic region downstream of the TTS was less than 3 kb. Expression levels were calculated using the following equation: TPM 3 kb [TTS] = number of reads aligned in 3 kb downstream of the TTS of each transcript × 106/total number of reads aligned in 3 kb downstream of the TTS.

In addition, TPM 3 kb [TTS] scores were normalized using the following equation to consider the effect of the exonic region expression level:

TPM 3 kb [TTS/exonic region] = TPM 3 kb [TTS] of each transcript/TPM of the exonic region of each transcript.

For a metagene analysis, we compared the normalized read counts (TPM 3 kb [TTS/exonic region]) between WT and Dnmt-/- mutant cells.

Box plots were generated using R scripts and statistical significance was determined using the Wilcoxon signed-rank test (P ≤ 0.05).

Calculation of expression level downstream of hypermethylated and hypomethylated TTSs in WT cells

The methylation levels (ratio) of the WGBS data were identified using DMAP (version 1.75) (Stockwell et al., 2014) from the Bismark output. Loci with scores of > 0.9 and ≤ 0.9 were defined as hypermethylated and hypomethylated loci for WGBS data, respectively. Methylation levels (RPKM) of MeDIP-Seq were identified from the MEDIPS output. Loci with scores of > 0.8 and ≤ 0.8 were defined as hypermethylated and hypomethylated loci for MeDIP-Seq data, respectively. The window size was 700 bp for both WGBS and MeDIP-Seq data. The hypermethylated and hypomethylated loci overlapping with the TTS were determined using bedtools intersect (version 2.28.0) (Quinlan and Hall, 2010). Calculation of expression levels downstream of the TTS was performed using a method based on “Calculation of expression levels downstream of the TTS from RNA-seq data” in the preceding section. Box plots were generated using R scripts, and statistical significance was determined using the Wilcoxon rank-sum test (P ≤ 0.05).

Calculation of expression level of chimeric transcripts from RNA-seq data

Chimeric transcripts that were aberrantly fused with downstream genes or intergenic DNA were manually counted using the IGV (Robinson et al., 2011). The expression levels were calculated as CPM mapped reads. Normalized read counts (CPM) were compared in WT and Dnmt-/- mutant cells using metagene analysis. Transcripts with expression that increased downstream of the TTS were used for the metagene analysis. Box plots were generated using R script. Statistical significance was determined using the Wilcoxon signed-rank test (P ≤ 0.05).

ACKNOWLEDGMENTS

We thank our laboratory members and Dr. Shyunya Hozumi for helpful discussion and critical comments. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics. This work was supported by JSPS KAKENHI Grant Numbers JP18K06185 and JP21K06123 to Y. K.

CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest with the contents of this article.

REFERENCES
 
© 2022 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top