2021 Volume 86 Issue 1 Pages 3-9
Genomic DNA is constantly exposed to various types of exogenous and endogenous stimulants that induce DNA lesions, including single-strand breaks (SSBs) and double-strand breaks (DSBs). Unrepaired DNA damages eventually cause adverse effects on a wide range of cellular and physiological processes; therefore, it is of great interest to map the damaged and repaired DNA to elucidate the damage distribution on a genome-wide scale. In the past decade, several sequence-based approaches for detection and quantification of such modified DNA have been established via technological innovation in sequencing analysis, which have expanded our understanding of DNA damage and repair. This review provides an overview of next-generation sequence-based methods for damaged DNA analysis with a focus on DNA strand breaks, SSBs, and DSBs.
Genomic DNA is constantly exposed to exogenous stimuli, such as ultraviolet (UV) or gamma radiation (Takahashi et al. 2019, Suto et al. 2020), environmental pollution (Basu et al. 2019, Pramanik et al. 2019, Hani et al. 2020), artificial chemicals (Ergin et al. 2020, Gantayat et al. 2020), chemotherapeutic drugs (Gupta et al. 2020), dietary genotoxins, and endogenous stimulants during physiological cellular processes (Jackson and Bartek 2009, Paigen and Petkov 2010, Diaz and Pecinka 2017, Tubbs and Nussenzweig 2017, Toyoda and Matsunaga 2019, Marques et al. 2020, Mingard et al. 2020, Nishioka et al. 2020). These stimulants are known to induce approximately 70,000 lesions every day in each human cell, including single-strand breaks (SSBs) and double-strand breaks (DSBs), which are essentially fixed via various DNA repair pathways (Kunkel 2015, Tubbs and Nussenzweig 2017); however, sometimes DNA damage is not repaired correctly, and the accumulation of these unrepaired damages is known to cause various pathological conditions, such as premature aging, cancer, and neurological disorders (Lord and Ashworth 2012, Baranello et al. 2014, Canela et al. 2016, González and Plasencia 2017). Therefore, to further explore the mechanism of DNA damage occurrence and repair, it is essential to accurately analyze the position, quantity, frequency, and repair of DNA damage. Numerous methods have been developed to detect and quantify various types of DNA damage, including SSBs and DSBs.
Among these, fluorescence strategy-based methods such as the Halo assay, comet assay, terminal deoxynucleotidyl transferase (TdT) dUTP nick-end labeling (TUNEL) assay, and DNA breaking detection (DBD)-fluorescence in situ hybridization (FISH) are the most common techniques. The halo assay is based on the interaction of propidium iodide (PI) with the DNA helix, causing a change in the supercoiling status of the DNA. In this method, cells are lysed and nucleoids of each cell are visualized as “halo”; moreover, the chromatin fragility can be determined by measuring the halo area (Kumari et al. 2008, González and Plasencia 2017). The comet assay, which can quantify SSBs or DSBs, is based on the ability of denatured damaged DNA fragments to stream out of the nucleus under electrophoresis, appearing like a comet. In this method, a spherical mass of undamaged DNA was identified as a comet head, and the damaged DNA was identified as a tail (Jyoti et al. 2013, González and Plasencia 2017). Another widely used method is the TUNEL assay to detect SSBs or DSBs and levels of apoptosis by visualizing the nuclei containing DNA fragments (Kumari et al. 2008, González and Plasencia 2017). Combined with an in situ ligation assay, which is based on the ligation of double-stranded oligonucleotide probes by T4 DNA ligase, this method can specifically detect DSBs (Didenko and Hornsby 1996, González and Plasencia 2017). DBD-FISH features an improved resolution and quantifies SSBs and DSBs in the genome or in a specific DNA sequence from a single cell by combining FISH techniques (Fernández et al. 2001, González and Plasencia 2017); however, these fluorescence-based methods have certain issues, such as background autofluorescence, irregular signals, and reproducibility (Levsky and Singer 2003).
Other methods, such as high-performance liquid chromatography-electrospray tandem mass spectrometry (HPLC-ES-MS), gas chromatography-mass spectrometry (GC-MS), and electrochemical methods (EM) are based on analytical strategies. HPLC-ES-MS can determine the location and quantity of SSBs by detecting modified nucleobases that occur during base excision repair (BER), a DNA repair pathway. Although this method is sensitive and accurate, it is not economic; however, it is an efficient technique to obtain accurate data. GC-MS can sensitively detect various types of DNA damage, including SSBs. Similarly, EM can detect the majority of DNA damage products as well as SSBs at a low cost. Changes in DNA, which are induced by reactive oxygen damage, can be detected using electrochemical methods based on the inherent sensitivity of DNA-mediated charge transport (CT) (González and Plasencia 2017). By applying this method, DNA-mediated CT can be used to detect the damage mechanism in DNA repair enzymes (Boal and Barton 2005, González and Plasencia 2017).
While it is certain that the aforementioned technologies have greatly expanded our knowledge of DNA damage and repair, these methods are unable to accurately locate DNA damages. To obtain an overview of DNA damage and repair as well as to exploit the knowledge for clinical purposes such as cancer therapy, it is crucial to detect and quantify DNA damage with high resolution genome-wide, and to identify the sequence-specific damage location. Here, we focus on the DNA strand breaks and reveal recently developed sequence-based approaches to provide a genome-wide overview.
SSBs are the most common DNA damage occurring in cells, and these lesions may stall or collapse the replication forks, thereby leading to more profound DSBs (Kuzminov 2001, Caldecott 2008, Cao et al. 2019). Most of the previous methods to investigate DNA breaks have been based on indirect labeling of breaks; however, for deeper understanding about DNA damage repair, direct SSB detection techniques and genomic landscape of SSBs are required. In the present study, we explain three recent methods to map SSBs that feature the direct detection of lesions.
In the SSB-Seq method, SSBs were labeled with nucleotides covalently linked to digoxigenin during a nick translation with DNA polymerase I. To increase the mapping resolution, dideoxynucleotides were included in the reaction to appropriately suppress chain elongation by DNA polymerase I. Labeled fragments were immunoprecipitated with anti-digoxigenin antibody (anti-DIG) and sequenced (Baranello et al. 2014).
SSiNGLe methods are based on tagging the 3′-OH terminus representing an SSB position by adding a poly A tail with terminal transferase (TdT). Before the tagging step, the high molecular weight DNA with 3′-OH is fragmented into a range of 150–500 base pairs using micrococcal nuclease (MNase), leaving 3′ primer-phosphate termini that cannot be recognized by the TdT. Two different sequencing techniques are used to map the SSB genome wide, namely Helicos Single Molecule Sequencing (SMS) and Illumina platforms (ILM). SMS technology (SSiNGLe-SMS) is relatively simpler, and 3′ poly A tails caught by dT oligo nucleotides on flow cells can be sequenced directly without any extra steps. For SSiNGLe-ILM, three steps were added before NGS was performed, including amplification with oligo (dT) primer, 3′ polyC tailing with TdT, and PCR amplification with Illumina adaptor primers. Through these steps, SSiNGLe-ILM is adapted to the more widely used Illumina NGS platform and can produce longer reads. Therefore, a strategy using ILM (SSiNGLe-ILM) is used to obtain more accurate mapping data of SSBs (Cao et al. 2019).
The GLOE-seq method has two strengths compared to the previous two methods. First, SSBs can be detected with higher resolution without using nick translation or poly A tailing, which may confuse the original terminus position (Sriramachandran et al. 2020). Moreover, 3′-OH ends of SSBs are directly ligated with T4 DNA ligase utilizing a splinter oligonucleotide with a stretch of random bases hybridized to 3′ biotinylated adaptor (Gansauge et al. 2017). After ligation, fragmentation, and biotinylated adaptor capture, they were subjected to NGS. Second, GLOE-seq can be used to detect numerous damages in few steps. Furthermore, this tool can map nicks, gaps, DSBs, Okazaki fragments, and various lesions, such as UV irradiation-induced pyrimidine dimers, abasic sites, or incorporated ribonucleotides (Sriramachandran et al. 2020).
DSBs arise in various physiological conditions such as meiosis, transcription, and replication stress, and during the repair of other types of DNA damage (Negrini et al. 2010, Baudat et al. 2013, Madabhushi et al. 2015, Schwer et al. 2016, Yan et al. 2017, Mingard et al. 2020), and it is the most profound damage among various lesions as it causes cell death, cell cycle arrest, translocations, deletions, amplification, and formation of oncogenic mutations (Bennett et al. 1993, Sandell and Zakian 1993, Jackson and Bartek 2009, Zhang et al. 2010a, Crosetto et al. 2013). In the past two decades, several techniques have been developed for mapping genome wide DSBs. The first method for mapping DSBs focused on chromosomal translocations, which are known to occur frequently in lymphomas, leukemia, and solid tumors (Küppers 2005, Nussenzweig and Nussenzweig 2010, Tsai and Lieber 2010, Tsai et al. 2008, Zhang et al. 2010b, Klein et al. 2011). In translocation-capture sequence (TC-Seq) and high-throughput genome-wide translocation sequencing (HTGTS), both established in 2011, DSB was induced into fixed locations using I-ScI meganuclease. The sequence of the region that formed junction with the induced DSB site was determined to be genome-wide sequence. In most cases, the induced DSB site undergoes end-joining with other DSB sites on the genome; therefore, this technique can be used for genome-wide mapping of DSBs (Chiarle et al. 2011). Linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) established in 2015 is a more robust, cost-efficient, and rapid approach to map DSBs genome wide by combining LAM-PCR with original HTGTS (Frock et al. 2014, Hu et al. 2016).
DSBs can also be guided to a specific position of genomic DNA using an artificial method. These genome editing techniques using nucleases such as zinc finger nucleases (ZFNs); clustered, regularly interspaced, short palindromic repeats (CRISPR)- -associated protein 9 (Cas9) nucleases; CRISPR RNA-guided nuclease (RGNs); and transcription activator-like effector nucleases (TALENs) are presumed to be utilized in various studies or medical treatments. These techniques are effective means of genome editing; however, they are known to cause chromosomal translocations or DNA strand breaks outside the editing target site, known as “off-target” effect. Analysis of off-target effects and the specificity of enzymes are required to improve and stabilize genome editing technology. Using the property that the linear double-stranded integrase-defective lentiviral vector (IDLV) genome is preferentially incorporated into DSBs during nonhomologous end-joining (NHEJ) repair, a method for genome-wide detection of DSBs without bias using IDLV has been established (Garbriel et al. 2011, Wang et al. 2015). This method can detect DSBs in living cells caused by ZFNs, CRISPR-Cas9 nucleases, and TALENs. Similarly, genome-wide, unbiased identification of DSBs is enabled by sequencing (GUIDE-seq) label DSBs in living cells induced by CRISPR RGNs with double-stranded oligodeoxynucleotide (dsODN) integrated by NHEJ. This allows the unbiased mapping of DSBs genome-wide (Tsai et al. 2015). Moreover, digenome sequencing also maps off-target effects of the CRISPR-Cas9 system in the genome of human cells, but is not limited by chromatin accessibility. In addition, compared with HTGTS and GUIDE-seq, this method can omit certain steps in order to avoid bias and provide more accurate data (Kim et al. 2015).
These DSB mapping methods have provided a genome-wide DSB landscape and expanded the genome editing technology and knowledge of DNA damage and repair systems; however, most of these are based on the indirect detection of DSBs, which may lead to a lower estimate of the DSB frequency. In contrast, various methods for directly detecting DSBs have been developed. Next, we introduce the simple principles and features of these methods.
The BLESS (direct in situ breaks labeling, enrichment on streptavidin, and next-generation sequencing) method established in 2013 is the first study to directly map DSBs genome wide in different cell types and conditions (Crosetto et al. 2013) and is the basis for developing numerous methods described below. Genomic DNA, including DSBs, is blunted after fixation and 5′- phosphorylation. Thereafter, it is labeled with biotinylated linker in ligation step using highly specific T4 ligase enzyme, which can ligate the double strand but not single-strand breaks (Crosetto et al. 2013) for streptavidin enrichment step. Furthermore, another linker that allows PCR amplification and sequencing is ligated to the free brink of caught fragments.
DSB-seq is a method established together with SSB-seq, since blunting or 5′-OH phosphorylation of DSBs is not performed, it is simpler than the BLESS method. In the DSB-seq method, the 3′-OH terminus of DSBs was labeled with biotinylated nucleotides by TdT and streptavidin selected after sonication. The samples were then subjected to Illumina library preparation and sequencing (Baranello et al. 2014).
Break-seq is a combination of the break-chip method, which is a microarray-based simultaneous mapping of single-stranded DNA (ssDNA), double-strand breaks (DSBs), and next-generation sequencing (Hoffman et al. 2015). This makes it possible to map chromosomal damage genome-wide with much higher sensitivity and resolution than before. Like the Break-chip, the DNA end was tagged with biotinylated dATP by end-repair using T4 DNA polymerase and fragmented, followed by capture with streptavidin. Thereafter, it was amplified via PCR and sequenced.
END-seq is an improved and simple method that allows genome-wide mapping of end resections and DSBs in vivo with higher resolution and quantitative information. A comparison with BLESS revealed that formaldehyde fixation in the BLESS method may cause alterations in the structure of the DNA ends and decrease the sensitivity and specificity of the detection (Canela et al. 2016). To circumvent this problem, live cells were wrapped in low melting agarose and genomic DNA was blunted, A-tailed, and ligated to biotinylated hairpin adaptor for streptavidin selection. Thereafter, DNA was extracted and fragmented, followed by a second end repair and A-tailing of new ends caused by fragmentation. By performing PCR amplification after these operations, the library in which the first base sequence corresponded to the first base of the blunted DSBs was prepared for Illumina sequencing (Canela et al. 2016). This method is applicable to various tissues and organisms and allows the detection of a single DSB in 10,000 cells, and the relative number of reads at a specific position is proportional to the fraction of cells carrying the DSB (Canela et al. 2016).
The DSBCapture method resolved various issues of the conventional method and dramatically improved it. In previous methods, the sequence commenced from both the captured DSB site and another end generated through fragmentation; therefore, the number of sequencing reads directly indicating the DSB site in single-end sequencing was halved. Moreover, these methods require two rounds of PCR for sequential addition of capture and sequencing adapters, which may introduce bias (Aird et al. 2011, Lensing et al. 2016). The resulting library becomes less diverse and often needs to be diluted with a library unrelated to the sample to compensate for diversity. Therefore, the data yield from the original sample was significantly reduced. Furthermore, modified P5 Illumina adaptor was used in the DSBCapture method to overcome these issues. This P5 Illumina adaptor allows the sequence to start only from the captured DSB site without any extra steps for library preparation. In addition, by virtue of high sequence diversity, the resulting libraries do not need to be diluted with another library. Thus, the DSBCapture method with simpler experimental operation provides higher sensitivity and reproducibility, enhanced data yield, and quality in genome-wide DSB mapping at nucleotide resolution compared to the conventional methods (Lensing et al. 2016).
The in situ and sequencing (BLISS) method is more versatile and sensitive compared with previous methods for mapping DSBs genome-wide due to the following three features. The first is to place sample cells or tissues fixed by formaldehyde on a solid surface and label DSBs directly on it. This allows all in situ reactions to occur without centrifugation, avoiding the risk of artificial DNA breaks or sample loss. Thereafter, DSB ends were in situ blunted and ligated with double-stranded DNA (dsDNA) adapters, including a sample barcode suitable for multiplexing, a random stretch of 8–12 nucleotides that serves as a unique molecular identifier (UMI), the RA5 Illumina sequencing adapter, and the T7 promoter sequence (Yan et al. 2017). This ligated dsDNA enables the following second and third features: the second is the linear amplification of labeled DNA using T7-mediated in vitro transcription, which enables low-input requirements. Eventually, the third feature is easy multiplexing and scalability.
Quantitative DSB sequencing (qDSB-Seq) is the first developed method to provide accurate data on DSB distribution genome-wide and absolute frequency per cell. Although the BLISS method is available for quantification together with mapping DSBs genome-wide by counting the UMI ligated to DSBs, it is considered that the UMI-based quantification is inaccurate and unstable as it is affected by the sequencing depth. To solve this problem, the BLISS method requires additional complicated operation and high cost (Zhu et al. 2019). The qDSB-Seq method is a combination of an arbitrary DSB labeling method and induction of DSBs using site specific endonuclease, which enables direct quantification. As soon as the labeled DSB was sequenced, sequencing or qPCR was performed on genomic DNA to extrapolate the frequency of endonuclease-induced DSBs, which are then used for quantification of absolute DSB frequencies per cell in the sample (Zhu et al. 2019).
By establishing DNA damage mapping technologies introduced in this review (summarized in Table 1), several aspects of DNA strand breaks have been clarified. These studies have revealed that SSBs and DSBs in the genome do not occur randomly but are enriched in regulatory elements, specific repeats, exons, introns, and the distribution pattern changes with biological states. The characteristics and differences in the induction frequency of DNA strand breaks among various damage inducers have also been clarified. Moreover, a genome-wide overview of the off-target effects of nucleases used in genome editing technologies has been presented. These findings have greatly broadened our understanding of DNA strand breaks and their repair; however, these methods are specialized only for DNA strand break detection. Further improvements and broader applications of the technologies described in this review will enable us to comprehensively analyze various types of DNA damage, including DNA alkylation, misincorporation, DNA adducts, and strand breaks, thereby clarifying the whole picture of DNA damage and repair mechanisms.
This research was supported by grants from MXT/JSPS KAKENHI (19H03259, 20H03297, and 20H05911) to SM and (19K06748) to TS. S.M. is also supported by Novartis Foundation, Mitsubishi Foundation, and JST, CREST Grant Number JPMJCR20S6. We would like to thank Editage (https://www.editage.jp) for editing this manuscript for English language.