2025 Volume 100 Article ID: 24-00174
The sequencing of PCR fragments amplified from specific regions of genomes is a fundamental technique in molecular genetics. Sanger sequencing is commonly used for this analysis; however, amplicon sequencing utilizing next-generation sequencing has become widespread. In addition, long-read amplicon sequencing, using Nanopore or PacBio sequencers to analyze long PCR fragments, has emerged, although it is often more expensive than Sanger sequencing. Recently, low-cost commercial services for full-length plasmid DNA sequencing using Nanopore sequencers have been launched in several countries, including Japan. This study explored the potential of these services to sequence long PCR fragments without the need for cloning into plasmid DNA, as cloning long PCR fragments or blunt-end PCR fragments into plasmids is often challenging. PCR fragments of 4–11 kb, amplified from the DFR-B gene involved in the biosynthesis of anthocyanin, with or without Tpn1 transposons in Japanese morning glory (Ipomoea nil), were circularized using T4 DNA ligase and analyzed as templates. Although some inaccuracies in the length of homopolymer stretches were observed, the remaining sequences were obtained without significant errors. This method could potentially reduce the labor and costs associated with cloning, primer synthesis and sequence assembly, thus making it a viable option for the analysis of long PCR fragment sequences. Moreover, this study reconfirmed that Tpn1 transposons are major mutagens in I. nil and demonstrated their transposition in the Violet line, a long-used standard in plant physiology.
Sequencing of genome fragments amplified by PCR is routinely performed in laboratories worldwide. Although Sanger sequencing has traditionally been the standard method used, amplicon sequencing using next-generation sequencing (NGS) has become increasingly common, particularly for microbiome analysis (Caporaso et al., 2011). Moreover, long-read amplicon sequencing, which can sequence PCR products greater than 10 kb in length using technologies such as PacBio HiFi and Oxford Nanopore Technologies sequencing, has recently emerged (Karst et al., 2021). However, to sequence a single PCR product, the NGS throughput is excessive. Consequently, multiplex analysis, which allows for the simultaneous sequencing of many PCR products, is often employed to reduce the cost per sample. Nonetheless, Sanger sequencing remains the most common approach when analyzing a small number of samples.
Although long-read amplicon sequencing services are available, they tend to be more expensive than Sanger sequencing. However, low-cost full-length plasmid DNA sequencing services using Nanopore technology have recently been launched by several companies (e.g., Eurofins Genomics (USA), AZENTA Life Sciences (USA) and Plasmidsaurus (USA)) and are currently available in Japan. Utilizing these services can substantially reduce the labor, cost and time required compared to Sanger sequencing through primer walking. Moreover, this reduction increases with the analysis of longer DNA sequences. In this study, a commercial plasmid DNA sequencing service was used to determine the sequence of circularized PCR products, which were created to resemble plasmid DNA through ligation (Fig. 1).
Japanese morning glory, Ipomoea nil, is a unique bioresource of Japan. In domestic lines, Tpn1 family transposons exhibit high transposition activity, which leads to numerous insertion mutations (Inagaki et al., 1994; Fukada-Tanaka et al., 2000; Hoshino et al., 2001, 2009, 2016; Nitasaka, 2003; Iwasaki and Nitasaka, 2006; ; Morita et al., 2014, 2015). Most of these mutations affect the flower color and patterns as well as the morphology of flowers and leaves, due to selective breeding for horticultural purposes. In the Tokyo Kokei Standard (TKS) line, the whole-genome sequence revealed 339 copies of Tpn1 transposons (Hoshino et al., 2016). The majority are non-autonomous transposons that lack transposase genes, with an average length of 7 kb. Their internal sequences contain fragments of multiple host genes, thereby contributing to their diversification (Takahashi et al., 1999; Kawasaki and Nitasaka, 2004). Subterminal repetitive regions (SRRs) consisting of 104- and 122-bp tandem repeats are located at the 5' and 3' ends, respectively (Hoshino et al., 1995; Kawasaki and Nitasaka, 2004). The length of the SRR regions often exceeds what is obtainable through Sanger sequencing, requiring the assembly of multiple sequences to determine their full sequence. Moreover, the repetitive nature of these regions makes accurate assembly difficult, thereby posing an obstacle for determining the full-length sequences of the Tpn1 transposons. Of the eleven Tpn1 transposons identified as causing mutations, the full-length sequences of five have not yet been determined (Hoshino et al., 1995, 2009; Nitasaka, 2003; Iwasaki and Nitasaka, 2006; Morita et al., 2015). This study analyzed the full-length sequences of three novel Tpn1 transposons that confer white flowers when inserted into the DFR-B gene, which participates in anthocyanin biosynthesis (Fig. 2), using a commercial service for full-length plasmid DNA sequencing.
The analysis of PCR products circularized with T4 DNA ligase, as shown in Figure 1, by a full-length plasmid DNA sequencing service was validated with two known sequences. First, the sequence of the DFR-B gene region of the TKS line (Fig. 2A), whose whole genome has previously been sequenced (Hoshino et al., 2016), was determined. A total of 1,608 reads with an average length of 1,930 bp were obtained, and 1,585 of these reads were used to produce a 4,130-bp circular sequence (Supplementary Table S1, S2). From this circular sequence, a linear sequence was constructed with the PCR primer sequences at both ends. This sequence was identical to the published TKS genome sequence, except for a 1-bp deletion at a 12-bp polythymine stretch in intron 1 (Fig. 3A). Next, the DFR-B gene sequence of AK006 (Fig. 2B) was analyzed in the same manner as that of TKS, which yielded a 10,544-bp sequence. This line harbors the a3-flecked allele, in which Tpn1 is inserted into the DFR-B gene (Fig. 3A, 3B). Comparison of this allele sequence, which was previously determined in AK009, also known as KK/SSB-4 (Hoshino et al., 1995; Takahashi et al., 1999), with the newly obtained sequence revealed a 1-bp deletion in intron 1, identical to the result for TKS (Fig. 3A), along with 1–2-bp deletions at three positions within the Tpn1 sequence in 11–15-bp polythymine stretches (Supplementary Fig. S1). These results indicate that the accuracy of this long-amplicon sequencing is approximately 99.9%, with sequencing errors tending to occur in homopolymer stretches. These results are in agreement with a previous report that homopolymeric regions or regions with short repeats account for approximately half of all sequencing errors in Nanopore sequencing (Delahaye and Nicolas, 2021).
The long-read amplicon sequencing method was applied to uncharacterized DFR-B alleles with novel Tpn1 transposons. In AK007 (Fig. 2C), a 6,776-bp Tpn1 transposon, named Tpn7, was identified in intron 3 (Fig. 3A, 3B). Despite being independently isolated, AK010 and AK127 (Fig. 2D, 2E) possessed the same allele with a Tpn1 transposon inserted in intron 3 (Fig. 3A, 3B); this transposon was designated Tpn11. Based on the origins of AK010 and AK127, we speculate that seeds of the Violet line sold by Marutane included seeds derived from chimeric individuals containing the DFR-B::Tpn11 allele. Furthermore, this indicates that Tpn1 transposons are active in the Violet line, a long-used standard in plant physiology. In AK205 (Fig. 2F), another Tpn1 transposon insertion was found in exon 1, and named Tpn20 (Fig. 3A, 3B). AK205 originated from a germinal revertant of the DFR-B::Tpn1 (a3-flecked) allele, as AK205 and AK006 are descendants of K37. A 5-bp insertion, presumed to be a footprint sequence generated by Tpn1 excision, was detected (Fig. 3B). In addition, 3-bp insertion sequences, likely the Tpn7 footprint, were found in all of the analyzed DFR-B mutant lines, which suggests that these lines share a common ancestor plant harboring the DFR-B::Tpn7 allele that bears variegated flowers (Fig. 3B). Aside from the insertion of the Tpn1 transposon, a polymorphism was found in intron 5 (Fig. 3C). In AK006 and AK205, a single thymine base was replaced by a 4-bp sequence in other lines. However, as no three-base duplication, such as the Tpn1 transposon footprint, was observed in the 4-bp sequence, it is challenging to speculate on the cause of this polymorphism.
To investigate the distribution of the newly identified transposons, a BLASTn search against I. nil sequences was performed. Two copies of Tpn7-like elements were found in the TKS genome sequence, which differed only in the length of the polythymine stretches (Supplementary Table S3). Tpn20 was highly similar to Tpn13, which was inserted in the EFP gene (Morita et al., 2014), as well as to five transposon copies in the TKS genome, but exhibited multiple polymorphisms, including a single-nucleotide substitution, as well as differences in the length of homopolymer sequences (Supplementary Table S4). No closely related transposons were found for Tpn11.
The same 1-bp deletion in intron 1, identical to the long-amplicon sequence in TKS, was found in the sequences of AK007, AK010, AK127 and AK205 (Fig. 3A). However, this deletion was not observed when the DNAs were analyzed using Sanger sequencing, which suggests that it was an error in the amplicon sequencing (Supplementary Fig. S2). A comparison of the Tpn11 sequences in AK010 and AK127 revealed differences in the length of the polythymine and polyadenine sequences (Supplementary Fig. S3). Although Sanger sequencing was attempted in these sequences, overlapping peaks prevented the determination of the exact length of the ~30-bp polyadenine stretch (Supplementary Fig. S3B). This heterogeneity was presumed to be due to PCR error, which resulted in polyadenine sequences of varying lengths being mixed within the amplicons. These results suggest that the length of homopolymer sequences cannot always be accurately determined because of PCR and sequencing errors. However, some homopolymer sequences are consistent with reference sequences and can be accurately analyzed, which indicates that not all such sequences are problematic in determining length (Supplementary Fig. S1). In addition, the sequences obtained from lines other than TKS revealed deletions or duplications at the primer sites on both ends (Supplementary Fig. S4). These sequences could be corrected using the primer sequences and posed no practical issues.
In this study, long amplicons greater than 10 kb were successfully sequenced by circularizing them with T4 DNA ligase and using a commercial plasmid DNA full-length analysis service. This method enabled the characterization of SRRs in Tpn1 transposons, which are challenging to sequence using Sanger sequencing. Unlike primer walking with Sanger sequencers, this approach does not require primers or assembling sequences, thus saving time, labor and cost. Although long-read amplicon sequencing services have recently been introduced in Japan, the cost of plasmid DNA full-length analysis remains less than one-tenth of these services. Further cost reductions are possible by preparing multiplex libraries and using in-house long-read sequencers; however, this requires an initial investment. Although the commercial long-read amplicon sequencing service offered by Plasmidsaurus is currently available in the United States and is cost-competitive compared to plasmid DNA full-length analysis until similar services become widely available, the method described in this study remains the most cost-effective option for determining the full-length sequence of long PCR products.
The plants used in this study are listed in Table 1, and their flower phenotypes are presented in Figure 2. AK010 and AK127, which bear white or variegated flowers, were independently isolated from seeds of the commercial Violet line (Marutane, Japan). AK205, which exhibits white or variegated flowers, was isolated from the progeny of a germinal revertant with fully colored flowers, derived from Q1072. AK007 and Q1072 are sublines of line K37. Genomic DNA was extracted using either the Genomic-tip 500/G (QIAGEN, Germany) or the GENE PREP STAR PI-480 (KURABO, Japan), as previously described (Hoshino et al., 2016). The 4–11-kb fragments of the DFR-B sequence were amplified by PCR using the primers DP-LF (5'-TTAACATGAGGGGATTGCATGTCACTTTCA-3') and D3U-LR (5'-CATAAATCTGGTTCGAGTGGCAATCTAACT-3') with Ex Premier DNA Polymerase (TAKARA, Japan). The PCR conditions were as follows: 94 °C for 1 min, followed by 35 cycles of 98 °C for 10 s, 57 °C for 15 s, and 68 °C for 7 min. The PCR product obtained was precipitated with ethanol and dissolved in 17 µl of Milli-Q water. Subsequently, 2 µl of T4 DNA ligase buffer (New England Biolabs, USA) and 1 µl of T4 polynucleotide kinase (10 U/µl; TAKARA) were added, and the reaction was incubated at 37 °C for 30 min to phosphorylate the 5' ends. An additional 1 µl of T4 DNA ligase (200 U/µl; New England Biolabs) was then added, and the mixture was incubated at room temperature for 2 h to circularize the DNA. The circularized DNA was recovered by ethanol precipitation, dissolved in low TE buffer pH 8.0 (10 mM Tris and 0.1 mM EDTA), and a 10-µl solution with a concentration of 50 ng/µl was prepared. Nanopore sequencing was performed using the Plasmid-EZ service, which provides full plasmid DNA sequence analysis (AZENTA, Japan). For sequences suspected to be sequencing errors, Sanger sequencing was performed using the DNA that was used as the template for the amplicon sequence. Sequences were analyzed using SnapGene version 5.0.8 (GSL Biotech, USA) and ATGC-MAC version 7.2.1 (NIHON SERVER, Japan).
Line | Other names | Parental line | Provider | Source |
---|---|---|---|---|
TKS | AK001, Q1065 | Eiji Nitasaka | Transferred from NIGa | |
AK006 | K37 | NIGa | Kichiji Kasahara deposited to NIGa | |
AK007 | a2 | Norio Saito | Obtained from NIGa | |
AK010 | Violet | Keiichi Shimizu | Purchased from Marutane | |
AK127 | Violet | Kiyotoshi Takeno | Purchased from Marutane | |
AK205 | Q1072Rb | Eiji Nitasaka | Transferred from NIGa |
a National Institute of Genetics.
b A germinal revertant of the DFR-B::Tpn1 allele of Q1072, which originates from K37.
The raw sequencing data have been deposited in the DNA Data Bank of Japan (DDBJ)/BioProject under accession number PRJDB18807. The DFR-B with Tpn1 transposon sequences reported in this paper is available in the DDBJ under the accession numbers LC843356–LC843360.
The authors declare no conflicts of interest.
The authors thank Eiji Nitasaka, Keiichi Shimizu, Kiyotoshi Takeno and the late Norio Saito for providing the I. nil seeds. Moreover, thanks are extended to Kazuyo Ito, Tomoyo Takeuchi, Kiyoko Kuzunishi and Naoko Koyama for their technical assistance, along with the Model Organisms Facility and Trans-Omics Facility, NIBB Trans-Scale Biology Center. JSPS KAKENHI supported part of this study with grants to A. H. (21K06239) and S. N. (23KJ1004).