Scarless genome editing technology and its application to crop improvement

The advent of CRISPR/Cas9 has had a disruptive impact on the world by bringing about dramatic progress and rapid penetration of genome editing technology. However, even though gene disruption can be easily achieved, there has been a challenge in freely changing the sequence. To solve this problem, various novel technologies have emerged in recent years to realize free rewriting of genome sequences. In this review, scarless editing by two-step HDR, a technology that can freely rewrite genomes from a single nucleotide to more than several thousand nucleotides, will be introduced.

ease is not comprised of gene deletions or amino acid substitutions, but rather copy number variations and SNPs in regulatory regions of gene expression such as promoter or non-coding regions in not only in human (Alsheikh et al. 2022, Hindorff et al. 2009) but also plants such as maize (Wallace et al. 2014).Identified sequence variation can be introduced by crossbreeding, but due to linkage disequilibrium, several megabases of surrounding sequences are introduced simultaneously, making high-resolution analysis difficult in physiological studies, and it requires several generations of backcrossing to reduce risk of linkage drag which may cause problems in variety improvement (Epstein et al. 2023).Therefore, it would be ideal if only the causal variant could be made by genome rewriting technology, but such sequence alteration is challenging by conventional genome editing methods.

Genome rewriting difficulties and editing scar
To rewrite a genome to a desired sequence using CRISPR/ Cas9, the genome editing must be mediated by a mechanism called homology directed repair (HDR) but not nonhomologous end joining (NHEJ) nor microhomologymediated end joining (MMEJ) which cause gene destruction (Porteus 2016).However, genome editing via HDR is not easy due to its low efficiency and the limitations of the editing design that leaves editing scars.The efficiency of HDR-mediated genome editing is usually less than onetenth to one-hundredth of the frequency of gene disruption via NHEJ or MMEJ, and the cells edited as intended by HDR are often less than 1% of the cell population that receives editing (Kim et al. 2022, Luo et al. 2023).In this case, screening of more than several hundred clones is required to obtain cells with intended editing.Editing scar means additional sequence changes to the CRISPR/Cas9 target sequence that must be intentionally introduced to prevent CRISPR/Cas9 from continuing to cut the target site after the targeted sequence change by HDR, resulting in gene disruption via NHEJ or MMEJ (Fig. 1A), or unwanted sequences left after and removal of selection markers by such as Cre/loxP system when the use of selection markers is unavoidable to obtain correctly edited cells due to low HDR efficiency (Fig. 1B).CRISPR/Cas9 has mismatch tolerance (Hsu et al. 2013), and even for Cas9 mutants with improved fidelity, it is difficult to avoid recutting and introducing undesired mutations by substituting a single nucleotide other than the PAM sequence which is critical for sequence recognition by Cas9 (Kim et al. 2023).In addition, the efficiency of introducing the intended edit by HDR decreases exponentially depending on the distance between the editing position and cutting position (cut-toedit distance) which is determined by guide RNA sequence (Paquet et al. 2016).Therefore, single nucleotide substitutions face multiple challenges unless the edit is on the PAM or very near the cut.Therefore, the cases in which CRISPR/ Cas9 has succeeded in making single nucleotide substitutions, the variants detected by GWAS, without leaving editing scars are limited to substitutions in the PAM sequence or mutations very close to the cleavage site.Base editing (Komor et al. 2016) and Prime Editing (Anzalone et al. 2019) have emerged as other means of genome rewriting.While these are very attractive tools and available in plant cells (Jiang et al. 2020, Li et al. 2020), base editing is limited in where it can edit (Jeong et al. 2020), and prime editing remains inefficient and requires complex design and optimization (Hassan et al. 2020).Also, these tools can only edit where the guide RNA can be designed.To edit specific genomic locations, guide RNAs must be designed to target unique sequences.In species with complex genomes, such as maize, where nearly 85% of the genome contains highly repetitive sequences (Schnable et al. 2009), and bread wheat, a hexaploid, which is rich in similar sequences (Zimin et al. 2017), it is rarely possible to specify a single genetic region with a sequence of about 20 or so bases recognized by a designed guide RNA.In such situations, it is often impossible to design an appropriate guide RNA for the area to be edited.

Why scarless editing is important?
If the change in the amino acid sequence encoded by the gene of interest is for research or breeding purposes, editing scars may be allowed as synonymous substitutions that do not change the amino acid sequence or by placing scars in untranslated regions (Kim et al. 2022).However, the nonrandom and biased use of synonymous codons during evolution in many organisms (Chiapello et al. 1998, Hershberg andPetrov 2008) suggests that synonymous substitutions are not always functionally synonymous.In fact, it is known that synonymous substitutions and mutations in introns that do not change the amino acid sequence affect splicing (Sarkar et al. 2022), mRNA stability, protein translation kinetics, and also significantly affect the threedimensional structure of proteins (Brule and Grayhack 2017) in eukaryotes.Reflecting these experimental facts, numerous genetic diseases have been shown to be caused by synonymous substitutions in humans (Bali and Bebok 2015).Therefore, it is very difficult to predict how an editing scar will affect function even if the amino acid sequence is not changed.Furthermore, when the purpose is to modify the sequence in the region responsible for regulation of gene expression, which is believed to determine many characteristics of cultivars, the introduction of editing scars is basically unacceptable since there is no concept of synonymous substitution.Hence, ideally, an editing method should be used that does not leave scars.

Scarless editing by two-step HDR
To meet the demand for genome editing without leaving editing scars as described above, two-step HDR has been developed (Paquet et al. 2016).In this strategy, to achieve a single nucleotide substitution, the first HDR changes the sequence of multiple bases, including the editing scar, and the second HDR leaves the desired single nucleotide substitution and restores the other changed bases to remove the editing scars.This strategy enables circumvent of difficult single nucleotide substitutions.However, this method does not solve the problem of low HDR efficiency.In the case of human iPS cells for which this method was developed, screening of 400-600 clones is recommended at each stage (Kwart et al. 2017), and since HDR efficiency is similarly low in plants, genome editing with this method is expected to be a very laborious task.Thereafter, to overcome the low HDR efficiency, a method combining two-step HDR and positive-negative selection with selection markers was developed (Ikeda et al. 2018).In this method, in the first step, the region containing cut-to-edit is replaced with a selection marker expression cassette, and genome-edited cells are enriched by positive selection.Then, in the second step, the selection marker expression cassette is replaced with a desired sequence, and cells that do not express the selection marker are enriched by negative selection.The use of selection markers relieves the pressure to achieve a highly efficient HDR and allows us to freely make just a single base substitution or more than thousands of base insertions, substitutions, and deletions in the genomic region replaced by the selection marker at the first step without screening a large number of clones (Fig. 2A-2C).In addition, it enabled editing at loci away from the guide RNA target, which was not possible with CRISPR/Cas9 or its derivative technologies (Fig. 2D).Thus, the requirement to design an appropriate guide RNA on the editing site has been eliminated, and the greater choice of guide RNAs that can be used allows for the use of guide RNAs with low off-target risk.Although this method is similar to the method traditionally employed to introduce large sequences by introducing the intended editing along with selection markers and then removing the selection markers with a Cre/loxP system or other system, several significant advantages over the conventional method can be noted.First, two-step HDR leaves no traces of selection marker removal such as loxP sequences.Furthermore, the feature of introducing only the selection marker at the first step and the target sequence at the second step allows variations to be created at the second editing step.This feature can be used to introduce reporter genes or various combinations of genotypes for SNPs scattered over more than kilobases at once (Fig. 3A).Additionally, when the selection markers are introduced into the  bi-allelic in the first step, two donor vectors, e.g.genotype A and genotype B, can be used in the second step to produce AA, AB, and BB genotypes simultaneously (Fig. 3B).At this time, there are no reported cases of this method being used in plants.However, this method is consisted by a simple procedure that repeats twice the insertion or removal of a selection marker through HDR and marker selection.
Since it has been reported that HDR-mediated insertion of a selection marker, which corresponds to the first step of this method, is already feasible in plants (Miki et al. 2018, Svitashev et al. 2015, Zhang et al. 2023), this scarless editing method would also be feasible in plants by using appropriate selection markers.

Future perspectives
Through GWAS analysis, many genomic sequence variations have been identified that can explain simple traits such as metabolic profile change (Wallace et al. 2014) and complex quantitative traits such as plant height (Peiffer et al. 2014), abiotic stress tolerance (Chen et al. 2017, Javid et al. 2022, Yasir et al. 2019) and disease tolerance (Kump et al. 2011, Poland et al. 2011).However, the genetic changes detected by GWAS may not be the true causal mutations as they may be mutations that do not affect function that are linked to the true causal mutations (Tam et al. 2019).In addition, the mechanisms by which such genome sequence variation affect function remain a black box.While a number of strategies, including statistical methods and genomic functional annotation (Broekema et al. 2020, Cannon and Mohlke 2018, Schaid et al. 2018), have been widely applied to prioritize causal variants (called fine mapping) and their target genes, definitive mechanism identification requires direct functional analysis studies comparing the presence or absence of the relevant mutations.Here, the fact that the majority of mutations detected by GWAS are synonymous mutations that do not result in amino acid sequence changes or are found in untranslated or intergenic regions makes it difficult to verify the causal variants and their targets and to elucidate the molecular mechanisms.If the functional change is due to amino acid sequence substitution, it is relatively easy to infer that the function of the protein in question is affected, and there are many options for functional analysis using artificial protein expression systems.However, the effects of the other type of genomic changes on physiological phenotypes are expected to involve various mechanisms, such as epigenetic mechanisms, transcriptional regulators, mRNA stability and splicing, translation regulation, and regulation by noncoding RNAs, and the evaluation systems are complex, making it difficult even for human research, which is considered the most advanced genome analysis (Rao et al. 2021).The ultimate solution to this problem is scarless editing, which enables the identification of genetic changes that have beneficial effects on cultivar traits and their mechanisms by comparing isogenic plants with only the in-tended mutations.Finally, the useful genetic changes thus identified can be introduced into other cultivars quickly by using scarless editing.

Fig. 1 .
Fig. 1.Editing scars in HDR based editing.A. Synonymous substitutions additionally introduced in HDR based genome editing to change amino acid sequence coded in genome.If the purpose of genome editing is to replace the initial isoleucine with valine, the genomic sequence (a) needs to be edited as in (b) to prevent continued cleavage by CRISPR/Cas9 after the HDR event.In this design, editing scars are synonymous mutations named as blocking mutation which are introduced to change the sequence of CRISPR/Cas9 target.If no synonymous substitutions are introduced as in (c), the CRISPR/Cas9 target remains after HDR, so it continues to undergo CRISPR/Cas9 cleavage and eventually the sequence is randomly altered by NHEJ error or MMEJ, resulting in unintended insertions or deletions, as in (d).B. Sequences remaining after removal of selection markers.Often selective markers are used to overcome the low HDR efficiency.Selection markers are unnecessary sequences for the final editing product and are therefore removed by the Cre/loxP or piggyBac systems.However, in the case of Cre/loxP, the loxP sequence remains, and in the piggyBac system, the sequence TTAA remains.

Fig. 2 .
Fig. 2. Two-step HDR editing variations combined with marker selection.Blue arrowhead: CRISPR/Cas9 target, orange line: selection marker expression cassette, green line: intended edit, green dotted line; sequence to be removed.

Fig. 3 .
Fig. 3. Variations of genotypes that can be created in the second HDR step.Blue arrowhead: CRISPR/Cas9 target, orange line: selection marker expression cassette, green line: intended edit, yellow line: reporter gene such as fluorescence or luminescence.