2023 Volume 98 Issue 4 Pages 179-189
Polyglutamine (polyQ) diseases are rare autosomal-dominant neurodegenerative diseases associated with the expansion of glutamine-encoding triplet repeats in certain genes. To investigate the functional influence of repeat expansion on disease mechanisms, we applied a biallelic genome-engineering platform that we recently established, called Universal Knock-in System or UKiS, to develop a human cell trio, a set of three isogenic cell lines that are homozygous for two different numbers of repeats (first and second lines) or heterozygous for the two repeat numbers (third line). As an example of a polyQ disease, we chose spinocerebellar ataxia type 2 (SCA2). In a pseudodiploid human cell line, both alleles of the glutamine-encoding triplet repeat in the SCA2-causing gene, ataxin 2 or ATXN2, were first knocked in with a donor sequence encoding both thymidine kinase and either puromycin or blasticidin resistance proteins under dual drug selection. The knocked-in donor alleles were then substituted with a payload having either 22 or 76 triplet repeats in ATXN2 by ganciclovir negative selection. The two-step substitution and subsequent SNP typing and genomic sequencing confirmed that the SCA2-modeling isogenic cell trio was obtained: three clones of 22-repeat homozygotes, two clones of 22/76-repeat heterozygotes and two clones of 76-repeat homozygotes. Finally, RT-PCR and immunoblotting using the obtained clones showed that, consistent with previous observations, glutamine tract expansion reduced transcriptional and translational expression of ATXN2. The cell clones with homozygous long-repeat alleles, which are rarely obtained from patients with SCA2, showed more drastic reduction of ATXN2 expression than the heterozygous clones. This study thus demonstrates the potential of UKiS, which is a beneficial platform for the efficient development of cell models not only for polyQ diseases but also for any other genetic diseases, which may accelerate our deeper understanding of disease mechanisms and cell-based screening for therapeutic drugs.
Disease-modeling cells (DMCs) are human cells that express cellular phenotypes related to specific diseases and thus are useful materials for medical and genetic research (Sterneckert et al., 2014; Avior et al., 2016). Induced pluripotent stem cells (iPSCs) are useful sources of DMCs (Yamanaka, 2009): iPSC pairs, which consist of two cell lines, one from a healthy individual and the other from a patient with a particular disease, can be differentiated into tissue type-specific cells related to the disease of interest. These can then be used to examine mechanisms of disease and to screen for drug candidates, as reported recently for drug discovery for amyotrophic lateral sclerosis treatment (Fujimori et al., 2018; Morimoto et al., 2019). In general, however, the genetic backgrounds of iPSC lines derived from different individuals are diverse, which prevents clear comparisons (Rouhani et al., 2014; Ben Jehuda et al., 2018). Therefore, in cases where a candidate disease-causing gene and associated mutation is known, incorporation of the mutation into cells from a healthy donor or correction of the mutation in patient-derived cells, either of which provides isogenic DMC pairs, is an alternative approach (Soldner et al., 2011; Ben Jehuda et al., 2018). Isogenic DMC pairs are thus paired cell lines that have an identical genetic background except that one line has the mutation of interest in the disease-causing gene candidate, whereas the control cells do not. Their comparison allows us to address whether mutations have any effects on disease-related cellular phenotypes and, if so, to proceed to studying the disease mechanism and developing new therapeutics in cell-based systems.
PolyQ diseases refer to a group of progressive neurodegenerative disorders (Shao and Diamond, 2007; Lieberman et al., 2019). In patients with these diseases, expansion of triplet repeats encoding glutamine stretches (mostly CAGs and sometimes CAA or CAT) is observed in the protein-encoding region of the disease-causing gene. To date, isogenic DMC pairs for polyQ diseases have been developed by genome engineering of disease-causing genes such as HTT (huntingtin; Huntington’s disease) (An et al., 2014; Xu et al., 2017; Dabrowska et al., 2020; Malankhanova et al., 2020), ATXN3 (ataxin 3; spinocerebellar ataxia-3) (He et al., 2021) and PPP2R2B (protein phosphatase 2 regulatory subunit Bbeta; spinocerebellar ataxia-12) (Li and Margolis, 2021).
Here we describe a novel approach for the development of isogenic DMCs for polyQ diseases. We used spinocerebellar ataxia-2 (SCA2) as our target polyQ disease and thus focused on the triplet repeat region in the first exon of the SCA2-causing gene, ataxin 2 or ATXN2. Whereas healthy individuals have 26 or fewer triplet repeats in ATXN2, expansion of the repeat region to 34 or more repeats is associated with SCA2, and an intermediate number of repeats (27–33) confers susceptibility to amyotrophic lateral sclerosis (Elden et al., 2010). Previously, Marthaler et al. (2016) reported their development of a SCA2 DMC pair by reducing the repeat number in SCA2 patient-derived iPSC lines to a number seen in healthy individuals by genome engineering. In their study, however, a homologous recombination (HR)-based knock-in was performed with a target fragment containing the neomycin resistance gene (Neo) cassette. This cassette was left in the first intron of ATXN2 only in the cell engineered to have a shortened repeat, not in the parental (i.e., non-engineered) cells, which had the long repeat. Thus, the resultant pair of cell lines was not genuinely isogenic because both lines differed not only in their repeat number but also in the presence/absence of the Neo cassette in the first intron of the disease-causing gene, ATXN2. The presence of the Neo cassette could influence the expression of ATXN2. For a more accurate comparison, engineering of the repeat number in the endogenous gene should be scarless, that is, it should leave no undesired sequence behind in the target locus.
To develop truly isogenic DMCs for SCA2 for the first time, we here applied a large-scale genome-editing method called UKiS (Universal Knock-in System), which we recently established for biallelic and scarless genome editing in human cells (Fig. 1) (Ohno et al., 2022). UKiS is a two-step HR process. In the first step, each allele of the target repeat locus of ATXN2 in diploid cells was replaced with a UKiS donor containing either a puromycin (Puro) or a blasticidin (Blast) resistance cassette by Cas9 nuclease and a guide RNA (gRNA) that specifically targets the ATXN2 repeat locus, followed by dual positive selection. In the second step of UKiS, based on the herpes simplex virus thymidine kinase (TK) that is also encoded by both UKiS donors, the integrated UKiS donors were replaced using ganciclovir-mediated negative selection with either of two different targeting fragments containing longer or shorter triplet repeats. This step was also facilitated by the clustered regularly interspaced short palindromic repeats (CRISPR) system with a gRNA targeting a site adjacent to the left homology arm on the UKiS donor. With these two steps, we simultaneously obtained three different types of cell lines that were (1) short-repeat homozygous, (2) long-repeat homozygous and (3) short- and long-repeat heterozygous, without any extraneous sequences left at the ATXN2 locus. With a demonstration of the effects of the repeat length on ATXN2 expression, this study demonstrates that UKiS provides a novel platform for developing an isogenic DMC trio for triplet repeat diseases, which will allow investigations into the effects of repeat number differences on cell physiology.
The structure of the two UKiS donors used in this study is illustrated in Figure 2A. Both donors are identical except for the positive drug selection marker, the coding sequence for puromycin-N-acetyltransferase and blasticidin-S-deaminase, each of which is linked by a 2A self-cleaving peptide sequence on both UKiS donors with the coding sequence of herpes simplex virus TK. The TK nucleotide sequence that we used was optimized for human codon usage (Supplementary Fig. S1). The chimeric proteins for the positive and negative selection markers are expressed from the EF1α promoter. The entire marker regions are flanked on both sides by cHS4 insulators (Uchida et al., 2013). The resulting UKiS marker cassettes are surrounded on both sides by homology arms of ~700 bp, which target the ATXN2 repeat locus. The UKiS donors also have a CRISPR targeting sequence just next to one of the arms, cleavage of which is expected to enhance HR during the second step of UKiS (Cong et al., 2013; Mali et al., 2013). The target sequence of this gRNA (called Off-Target Less gRNA, TL-gRNA) was designed such that its sequence is not present in the human genome, which minimizes the risk of off-target cleavage, and also is optimized to be cleaved efficiently by the Cas9 system (Tálas et al., 2017).
We used this approach to make SCA2 model cell lines from a human colon cancer cell line, HCT116. Although SCA2 is a neurodegenerative disorder, and thus the ideal model cell for this disease should be related to cells of brain and/or neural tissue origin for downstream analysis of cell physiology, this study had the goal of establishing an experimental system for making polyQ DMCs using HCT116 cells. As they are pseudo-diploid (Abdel-Rahman et al., 2001), unlike the many aneuploid cancer cells, HCT116 cells are useful as a somatic human cell line for testing and establishing biallelic genome modification methodology.
HCT116 cells were co-transfected with both UKiS donor fragments and a pX330 plasmid (Cong et al., 2013) expressing both Cas9 and the ATXN2-specific gRNA. The ATXN2-specific gRNA was designed to cleave the site adjacent to the region targeted by the 702-bp homology arm to enhance HR efficiency (Fig. 2B). Among the cell colonies formed after simultaneous puromycin and blasticidin selection, seven were randomly selected, manually picked and expanded for PCR genotyping using six different pairs of primers (Fig. 2B, 2C). For each primer pair, one of the two primers was designed to target the region between the arms and the other was designed to target outside the arm region, allowing us to assess the integration of both UKiS donors into the ATXN2 locus. Of the seven clones tested, three (1-2, 1-5 and 1-7) had the expected knock-in of the UKiS donor: they displayed the expected length of PCR products generated with the four primer pairs for detecting the UKiS donor-integrated ATXN2 alleles, and no PCR product with the two primer pairs for detecting the original ATXN2 exon 1 alleles.
Construction of longer and shorter triplet repeat fragmentsWe next constructed longer and shorter repeat fragments as targeting fragments for homologous recombination in the second step of UKiS. The shorter repeat fragment was cloned by PCR using genomic DNA from HCT116 cells as the template. The amplified region covers the ATXN2 exon 1 region and spans from the left-side edge of the 702-bp arm region to the right-side edge of the 750-bp arm region (Fig. 3A). The PCR products were subcloned and sequenced, showing that one of the clones had 22 repeats (20 CAGs and 2 CAAs at repeats 9 and 14 within the 22 triplet repeat track). This was used as the plasmid with the ATXN2 shorter repeat fragment for the second step of UKiS. The longer repeat fragment was next made by replacing the repeat part on the shorter repeat fragment with a synthetic longer repeat, which was made by non-template PCR with two 60-nucleotide primers, (CAG)20 and (CTG)20 (Fig. 3B). Partial hybridization between these primers and subsequent DNA polymerization generated longer CAG/CTG double-stranded DNAs (dsDNAs), and the dsDNAs became longer as the number of PCR cycles increased (Fig. 3B) (Ordway and Detloff, 1996). We purified DNA from the agarose gel region corresponding to 200–300 bp in length (corresponding to 67–100 repeats) that was generated after 15 PCR cycles. The resulting DNA was ligated with the product from the outward PCR of the plasmid with the ATXN2 shorter repeat fragment, which resulted in the loss of only the repeat part (Fig. 3C). After cloning and sequencing of the resulting ligated plasmids, we obtained a plasmid with the ATXN2 longer repeat fragment, which contains 76 repeats of CAG at the correct position in-frame with the ATXN2 coding sequence.
We proceeded with the second step of UKiS by co-transfection of both plasmids with the ATXN2 longer and shorter triplet repeat fragments and a pX330 plasmid expressing both Cas9 and the TL-gRNA into clone 1-2 (Fig. 4A), one of the HCT116 clones obtained after the first step of UKiS (Fig. 2C). Seven clones obtained after negative selection with ganciclovir were then subjected to PCR analysis. The primer pair used here allowed amplification of the ATXN2 locus including the repeat part and also a SNP site (rs889131359) that lies in intron 1 of ATXN2. We found that the original HCT116 cells were heterozygous at this SNP, with the reference sequence nucleotide (i.e., a C) in one allele and a 1-bp deletion in the other (Fig. 4A), which was useful for allele typing of the repeat for each cell clone.
Seven clones obtained after the second step of UKiS were analyzed by PCR to amplify the region of interest (Fig. 4A). Two clones (2-1 and 2-3) and three clones (2-2, 2-4 and 2-7) displayed a single amplified region of ~3.2 and ~3 kb, respectively. Direct sequencing of the PCR products extracted from these single-band regions of the gel confirmed the presence of 76 and 22 repeats in clones 2-1 and 2-2, respectively (Fig. 4B and 4C) and also demonstrated the presence of both the reference and non-reference sequence at rs889131359 in clones 2-1, 2-2, 2-3, 2-4 and 2-7, as seen in the original HCT116 cells (Fig. 4D). Thus, these five clones were homozygous for the 22 or 76 triplet repeats in the ATXN2 locus. The remaining two clones (2-5 and 2-6) yielded two separate PCR bands of ~3.2 and ~3 kb. The PCR products extracted from the gel were cloned, and the rs889131359 sites were sequenced (Fig. 4E). The ~3.2-kb and ~3.0-kb bands from both cell clones included the non-reference (1-base deletion) and reference (C) alleles, respectively, suggesting that these clones are heterozygous for the triplet repeats.
We next assessed whether the transfected repeat fragments were randomly integrated into any genomic region at the second step of UKiS by inverse PCR (Fig. 5A). The extracted genomic DNAs from the seven clones and the original HCT116 cell line were digested with each of three restriction enzymes (EcoRI, HindIII and PstI) and then self-ligated. The resultant circularized DNAs were subjected to inverse PCR with a primer pair that could be elongated outward on both homologous arms. As a result, the only bands observed had lengths expected of those originating from the ATXN2 locus, and no other band appeared for any of the three restriction enzymes, indicating that random integration had not occurred in the obtained cell clones (Fig. 5B). To the best of our knowledge, this study succeeded in developing genuine isogenic DMCs for SCA2 for the first time.
We next examined the effect of a difference in the repeat number on ATXN2 expression. First, RT-PCR was used to amplify the repeat region using a primer pair corresponding to exon 1 and exon 2 sequences, which revealed the expected band pattern (Fig. 6A). The 22- and 76-repeat homozygous clones each displayed a single band of the corresponding length, 547 and 709 bp, respectively, whereas the heterozygous clones gave rise to both bands. The slower-migrating bands from the heterozygous clones apparently had a lower intensity than the faster-migrating bands. Likewise, the 76-repeat homozygous clones displayed a lower band intensity than the 22-repeat homozygous clones. Immunoblotting also revealed that the shorter repeat allele resulted in higher expression of ATXN2 (Fig. 6B). These data are consistent with a previous report about transgenic mice that expressed a human ATXN2 reporter transgene with different triplet numbers (Dansithong et al., 2015). Furthermore, the 76-repeat alleles displayed a more drastic reduction in ATXN2 expression than did the heterozygous clones, suggesting the usefulness of DMCs that are homozygous for disease-type alleles for investigating functional impacts of the repeat expansion. It is of note that it is practically impossible to obtain cells homozygous for disease-type alleles from patients when the diseases are autosomal dominant.
This study demonstrates the capability and usefulness of UKiS for generating isogenic DMC trios for polyQ disease and thus for various future applications. For example, by generating a stepwise increase in repeat numbers across multiple cell lines, such as 20, 30, 40 and 50 repeats or more in human iPSCs, one may be able to quantitatively characterize the effects of repeat number on disease-related phenotypes after differentiation into neural cells. In the case of ATXN2, it will be of interest to assess whether an intermediate number of repeats leads to molecular events distinct from what the longer repeat induces, which may help us to understand the molecular mechanisms that differ between SCA2 and amyotrophic lateral sclerosis (Elden et al., 2010).
HCT116 (American Type Culture Collection) cells were maintained in McCoy’s 5A medium (Life Technologies) supplemented with 10% bovine growth serum (Biowest) at 37 ℃ under a 5% CO2 atmosphere.
Genomic DNA isolationFrom the original and modified HCT116 cell clones, genomic DNA was isolated using the Monarch Genomic DNA Purification kit (New England Biolabs).
Plasmid constructionWe constructed the gRNA expression vector by inserting annealed oligonucleotides including target sequences and adaptor sequences into BbsI-digested pX330 (Addgene plasmid #42230) (Cong et al., 2013). The gRNA sequences were designed using CRISPRdirect (https://crispr.dbcls.jp/) (Naito et al., 2015), based on the human genomic reference sequence (hg38) in the UCSC Genome Browser (Kent et al., 2002). The nucleotide sequence of TL-gRNA used in this study was from a previous report (Tálas et al., 2017).
To construct the UKiS first-step donor plasmid, we first synthesized the coding sequence for herpes simplex virus thymidine kinase (HSVTK) by oligonucleotide assembly. The nucleotide sequence that encodes HSVTK was based on the amino acid sequence of a mutant TK (SR39) (Black et al., 2001) but was codon-optimized for efficient expression in human cells (the nucleotide and amino acid sequences are shown in Supplementary Fig. S1). Next, the regions for UKiS positive selection markers (puromycin resistance and blasticidin resistance) were amplified by PCR from UKiS donor plasmids constructed previously (Ohno et al., 2022). The homology arms were amplified by PCR with genomic DNA isolated from HCT116 cells, and correspond to chr12:111,598,192–111,598,941 and chr12: 111,599,269–111,599,970 (hg38). Finally, the HSVTK coding sequence, either positive selection marker and both homology arms were assembled and integrated into the EcoRI and BamHI sites of the HR110PA-1 plasmid (Systems Biosciences) using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs).
To construct the UKiS second-step donor plasmid that contains the shorter repeat fragment, the genomic region of chr12:111,598,192–111,599,970 (hg38), which corresponds to the genomic region covered by both homology arms and the region between them, was amplified by PCR from genomic DNA of the original HCT116 cells. The resultant PCR product was cloned into the EcoRI/BamHI-digested HR110PA-1 plasmid using NEBuilder HiFi DNA Assembly Master Mix. DNA sequencing of the resulting plasmid demonstrated that the repeat number is 22.
To construct the UKiS second-step donor plasmid that contains the longer repeat, non-template PCR was first performed with two primers, (CAG)20 and (CTG)20, using 1× KOD One PCR Master Mix (Toyobo). The following thermal cycles were used: 98 ℃ for 30 s, followed by 5, 10, 15 or 20 cycles of 98 ℃ for 10 s, 62 ℃ for 5 s and 68 ℃ for 10 s. The PCR products of 200–300 bp in length that were generated after the 15-cycle reaction were gel-purified, 5′-phosphorylated and blunt-end-ligated with the linearized plasmid with the ATXN2 shorter repeat fragment from which only the CAG repeat part had been removed by inverse PCR using primers that hybridize to sequences in the flanking regions on either side of the repeat part. DNA sequencing of the resulting plasmid demonstrated that the repeat number is 76.
The cloned sequences in the constructed plasmids were all confirmed with the Big Dye Terminator v3.1 Cycle Sequencing kit (Thermo Fisher Scientific) using an ABI 3100 DNA sequencer (Applied Biosystems). All primers and oligonucleotides used for plasmid construction are listed in Supplementary Table S1.
Transfection and selectionPlasmid DNA used for transfection of HCT116 cells was prepared using the QIAGEN-tip 20 (Qiagen) and the EndoFree Plasmid kit (Qiagen). HCT116 cells were transfected using FuGENE HD transfection reagent (Promega) according to the manufacturer’s instruction. For the first step of UKiS, 2 × 105 HCT116 cells were cultured in six-well plates. For the second step, 2 × 105 HCT116 cells for each clone with the UKiS donors on both alleles were cultured in six-well plates. Approximately 14 h later, cells were co-transfected with the gRNA/Cas9 expression plasmid as well as the UKiS donor plasmids (for the first step) or with ATXN2 with the shorter repeat fragment, (CAG)20 and (CAA)2, and the longer repeat fragment, (CAG)76 (for the second step). After incubation at 37 ℃ with 5% CO2 for 12 h, selection was started with 1 μg/ml puromycin (Nacalai Tesque) and 10 μg/ml blasticidin (Funakoshi) for the first step or with 1 ng/ml ganciclovir (Wako) for the second step. Individual cell colonies were picked with 200-μl pipette tips for single cloning and clone expansion.
PCR and sequencing for knock-in assessmentGenomic DNAs were amplified using 1× KOD One PCR Master Mix with the following thermal cycles: 98 ℃ for 30 s, followed by 45 cycles of 98 ℃ for 10 s, 62 ℃ for 5 s and 68 ℃ for 60 s. The PCR products were subjected to 1% agarose gel electrophoresis using Agarose S (Nippon Gene) in 1×TAE (40 mM Tris-HCl, 20 mM acetic acid and 1 mM EDTA, pH 8.0). When proceeding to direct sequencing, the PCR products were extracted from gel pieces using the Zymoclean Gel DNA Recovery kit (Zymo Research) and were then sequenced using the Big Dye Terminator v3.1 Cycle Sequencing kit and an ABI 3100 DNA sequencer. When cloning was needed before sequencing, the gel-purified PCR products were cloned into the EcoRI/BamHI-digested HR110PA-1 plasmid using NEBuilder HiFi DNA Assembly Master Mix. All primers used for junction PCR and direct sequencing are listed in Supplementary Table S2.
Inverse PCR for random integration assessmentGenomic DNAs (1 μg) extracted from each of the seven clones were digested by EcoRI-HF (New England Biolabs), HindIII-HF (New England Biolabs) or PstI (New England Biolabs) and then used for self-ligation in a 200-μl total reaction volume of T4 DNA ligase (New England Biolabs). After purification using the Zymoclean Gel DNA Recovery kit, 20 ng of the ligated DNA was used in subsequent PCR using 1× KOD One PCR Master Mix with the following thermal cycles: 98 ℃ for 30 s, followed by 50 cycles of 98 ℃ for 10 s, 65 ℃ for 5 s and 68 ℃ for 2 min. The PCR products were subjected to 1% agarose gel electrophoresis using Agarose S in 1×TAE. All primers used for inverse PCR and direct sequencing are listed in Supplementary Table S2.
RNA extractionFrom the original and modified HCT116 cell clones, total RNA was extracted using the RNeasy Mini kit (Qiagen).
RT-PCRTotal RNA (1 μg) was reverse-transcribed with the SuperScript IV First-Strand Synthesis System (Invitrogen) using oligo-dT and random hexamers as primers according to the manufacturer’s instructions. The target gene sequence was amplified using 1× KOD One PCR Master Mix with the following thermal cycles: 98 ℃ for 30 s, followed by 40 cycles of 98 ℃ for 10 s, 62 ℃ for 5 s and 68 ℃ for 60 s. The PCR products were subjected to 1% agarose gel electrophoresis using Agarose S in 1×TAE. The primer sequences used in this experiment are provided in Supplementary Table S2.
ImmunoblottingAfter trypsinization with 0.25% (w/v) Trypsin-EDTA Solution (Nacalai Tesque), cells were collected at 300 g for 3 min. Cell pellets were lysed with lysis buffer (50 mM Tris-HCl, 1% sodium lauryl sulfate, pH 8.0). Protein concentrations were determined with the Pierce Rapid Gold BCA Protein Assay kit (Thermo Fisher Scientific). Lysate proteins (1 μg/sample) were subjected to SDS-PAGE (10% polyacrylamide) and electroblotted onto an Immobilon-P membrane (Millipore). Each blot was incubated overnight at 4 ℃ with an appropriate primary antibody diluted as follows: anti-ATXN2 (1:500; 611378, Becton Dickinson), anti-α-Tubulin (1:5,000; 017-25031, Wako). Next, each blot was incubated with horseradish peroxidase (HRP)-labeled anti-mouse IgG (1:5,000; NA931V, GE Healthcare). Immunoreactive proteins were detected with Immobilon Western Chemiluminescent HRP substrate (Millipore) and the ImageQuant LAS 4000 system (Cytiva). The band intensities were determined and expressed as pixel densities using ImageJ software (National Institutes of Health) (Schneider et al., 2012), and these values were normalized against the intensity of the corresponding loading control (α-Tubulin).
Y. A. is a co-founder and CSO of Logomix, Inc.
We thank the members of the Aizawa laboratory for their valuable discussions. This work was supported by JST CREST “Large-scale genome synthesis and cell programming” (Grant Number: JPMJCR18S5) to Y. A. A grant-in-aid (IBUNNYAYUGO) was provided to Y. A. by the Kanagawa Prefectural Government for Integration of Advanced Multidisciplinary Research Activities.