2022 Volume 47 Issue 2 Pages 47-58
Bacillus thuringiensis (Bt) has been used as sprayable pesticides for many decades. Bt strains utilized in these products produce multiple insecticidal proteins to complement a narrow insect specificity of each protein. In the late 1990s, genes encoding Bt insecticidal proteins were expressed in crop plants such as cotton and corn to protect these crops from insect damage. The first Bt protein used in transgenic cotton was Cry1Ac to control Heliothis virescens (tobacco budworm). Cry1Ab was applied to corn to control Ostrinia nubilalis (European corn borer). Since these insects have developed resistance to Cry1Ac and Cry1Ab, new Bt proteins are required to overcome the resistance. In order to protect corn furthermore, it is desired to control Diabrotica virgifera (Western corn rootworm), Helicoverpa zea (corn earworm) and Spodoptera frugiperda (fall armyworm). Recently, many new Bt insecticidal proteins have been discovered, but most of them require protein engineering to meet the high activity standard for commercialization. The engineering process for higher activity necessary for Bt crops is called optimization. The seed industry has been optimizing Bt insecticidal proteins to improve their insecticidal activity. In this review, several optimization projects, which have led to substantial activity increases of Bt insecticidal proteins, are described.
There are many microbial organisms pathogenic to insects. Almost all of them produce toxins that kill insects exclusively. One of these microbial insect pathogens is Bacillus thuringiensis (Bt). Bt was discovered in 1901 in Japan as a pathogen of silkworm and later used broadly as a sprayable insecticide.1) The most widely commercialized sprayable Bt formulation utilizes the HD1 strain of Bt subspecies kurstaki. This Bt strain has quite a few insecticidal proteins (toxins) such as 135 kDa Cry1Aa, Cry1Ab, Cry1Ac and Cry1Ia protoxins, 67 kDa Cry2Aa and Cry2Ab naturally matured toxins and possibly more. Since Bt insecticidal proteins are produced during the sporulation stage and crystalized in Bt cells (Supplemental Fig. S1), they are called Cry for crystal proteins or toxins. When Bt matures during fermentation, the cells lyse and release free spores and crystals into the culture medium. The spore and crystal complex is collected and formulated into sprayable products. Each of these HD1 insecticidal proteins has a unique host specificity. For example, Cry1Aa is particularly active to Bombyx mori (silkworm) and Trichoplusia ni (cabbage looper) but not active to H. virescens, while Cry1Ac is specific to H. virescens and H. zea but not active to B. mori. Cry2A was discovered in the HD1 strain by Yamamoto and McLaughlin,2) and Cry2Ab was later used in transgenic cotton to control Helicoverpa armigera (cotton bollworm) because of its high activity to this insect species.
Bt Cry insecticidal proteins are stomach poisons that must be ingested by insects and activated by proteases in the insect’s digestive system. The Cry protein is a powerful feeding inhibitor. It works well for transgenic Bt crop applications but not so well for sprayable products. Insects stop feeding after they ingest the first bite of Bt Cry proteins sprayed on plants. If the bite does not deliver a lethal dose, it would be difficult to kill the insects. The efficacy of Bt sprayable formulation persists only a few days after sprayed. The insects survived from the first bite can recover and resume feeding. This is a reason why sprayable Bt insecticides remain as a niche product. However, the feeding inhibition works well for transgenic crop application. As long as insects do not feed, the crop is protected. Bt insecticidal Cry proteins were applied to transgenic cotton, corn and soybean. The first Cry protein used in cotton was Cry1Ac cloned by Adang et al.,3) and in corn, it was Cry1Ab cloned by Höfte et al.4) Cry1Ac is highly active to H. virescens and H. zea, important cotton pests in the U.S. Cry1Ab is active to O. nubilalis, a major corn pest. Their specific activities to the main target pests appear to be in the range of EC50=10 ppm or lower than this EC50 value (i.e., higher activity) in plant tissue. Since then, it seems that this high activity level has become a preferred benchmark for other Bt insecticidal proteins used in transgenic crops. After O. nubilalis, S. frugiperda became an important pest for corn. Cry1Fa replaced Cry1Ab to overcome this problem, as the activity of Cry1Fa against S. frugiperda is within the activity benchmark. Cry1Ab is only moderately active to S. frugiperda. None of these Bt insecticidal proteins mentioned so far are active to a root-feeding corn pest, D. virgifera. To make corn resistant to this soil insect, Bt insecticidal proteins such as Cry3Ab and Cry34/35 were utilized. Since D. virgifera is a difficult insect to control with Bt Cry proteins even with Cry3Ab and Cry34/35, it is desired to discover a new potent protein or to optimize an existing one.
Since the early 2000s, there have been extensive efforts to find new insecticidal toxins from microorganisms, especially those of Bt. Efforts to find new Bt toxins were intensified after Next Generation Sequencing (NGS) became widely used. Since most, if not all, Bt insecticidal protein genes are on extrachromosomal plasmids usually huge ones as large as 150 kb, large plasmids of numerous Bt isolates were systematically sequenced to find genes encoding typical Bt Cry amino acid sequences. This discovery method of using NGS has dominated old methods such as biochemical identification and/or isolation of Bt Cry proteins. So far, many, if not all, newly discovered Cry proteins are not as active as those existing Cry proteins utilized in transgenic plants and/or sprayable pesticides. Therefore, the industry shifted their efforts to improve the activity of existing, or even newly discovered, insecticidal proteins of which activities are not as high as those in the commercial products. This process of improving the activity by protein engineering is called “optimization” and utilizes various engineering methods as described in this review. These methods include domain swapping, DNA shuffling and saturation mutagenesis. The latter two methods are employed often in directed evolution. Directed evolution mutates a gene to produce many mutant progenies, which are screened in the laboratory for higher fitness to a desired trait, such as higher insecticidal activity. Therefore, it is called “directed.” The other protein engineering technique is rational design based on the information of structure and function relationship. The rational design approach is not reviewed in this article as there have been no noteworthy success cases. A possible reason for the lack of success is that the mode of action of Bt Cry insecticidal protein has not been fully resolved.
This article is not a comprehensive review of Bt insecticidal proteins unlike the previous article reviewing Bt research and development up to the year 2000.1) Rather, it is focused to review certain methods of protein engineering applied to enhance the insecticidal activity of Bt Cry proteins. Most of the contents in this article are based on the research conducted after 2000 by us, i.e., my colleagues and I working together for Sandoz Agro, Maxygen Inc., Pioneer Hi Bred International and DuPont Pioneer. All information in this review has been published previously, but it was mostly in oral presentations at scientific meetings, issued patents and patent applications. Therefore, some experimental details are included in this review for the benefit of understanding the process of Bt Cry protein optimization.
The most important information for optimizing the insecticidal protein is the three-dimensional protein structure. The first X-ray structure of a Bt Cry protein, Cry3Aa, was determined in 1991 by Li et al.5) The structure of this protein revealed three distinctive domains, called Domain I, Domain II and Domain III as shown in Fig. 1. Domain I is composed of 7 α-helices and presumed as the membrane spanning domain. Domain II and Domain III are mostly composed of repeated β-strands, and their function is considered to be receptor binding. Domain II has three bundles of β-strands forming a triangular prism, and the receptor binds to the bottom of the prism as shown with the red circle on the structure of Cry3Aa in Fig. 1. Upon binding to the receptor, Bt Cry protein is supposed to insert its Domain I into the insect cell membrane to form an ion channel. Since Cry3Aa, the X-ray structures of Cry1Aa,6) Cry2Aa7) and several other Cry protein structures have been resolved. Surprisingly, all those structures are similar to Cry3Aa’s three-domain structure, even though the primary amino acid sequences are quite different.
Three-domain type Bt insecticidal proteins bind to a receptor on the insect midgut epithelial cells. In order to find the insect receptor to Bt Cry proteins, Brush Border Membrane Vesicles (BBMV) were isolated from insect midgut tissue and allowed to bind to Bt Cry proteins. Then, the “receptor” proteins or more accurately “any BBMV proteins, which bind to the Bt Cry protein” were identified. Those presumably receptor proteins found by this BBMV method include cadherin, aminopeptidase and alkaline phosphatase.
Lately, more direct methods of functionally identifying the receptor for Bt Cry proteins were reported. In 2012, Atsumi et al.8) found one amino acid deletion in ABCC2 (ATP-Binding Cassette transporter, C-family, member 2) of Cry1Ab-resistant B. mori larvae. The Cry1Ab-resistance in silkworm was reverted to sensitive by replacing the mutated ABCC2 with a non-mutated one. Endo et al.9) cloned and expressed B. mori ABCC2 (Bm-ABCC2) in the HEK293 mammalian cell line and found that Cry1Aa was cytotoxic to the recombinant HEK293 cells. Cry1Aa was not toxic to HEK293 without Bm-ABCC2 as this cell line is supposed to have no indigenous receptor to insect-specific toxins. Similar experiments were conducted with the Sf9 insect cell line derived from S. frugiperda pupal ovarian tissue. In Sf9, various ABC transporter genes were cloned and expressed.10,11) Sf9 cells, which were not sensitive to certain Cry toxins such as Cry1A, Cry1Ba, Cry1Da and Cry2Ab without the receptor proteins expressed. By this cell assay, these reports indicated that Cry1Fa’s receptor in S. frugiperda was ABCC2. Additionally, receptors of Cry1Ac (Cry1Ab, which has Domain I and II highly homologous to those of Cry1Ac), Cry1Ba and Cry2Ab in S. frugiperda were determined to be ABCC3 (ABCC2), ABCB1 and ABCA3, respectively. Figure 2 shows a structural model of S. frugiperda ABCC2 (Sf-ABCC2). As shown in the figure, Sf-ABCC2 has two large Extra-Cellular Loops (ECL1 and ECL4). These ECLs are likely to be involved in the binding to Cry1Fa, or at least ECL4. Banerjee et al.10) found that the ABCC2 gene of a Cry1Fa-resistant S. frugiperda colony was truncated before the 7th transmembrane α-helix resulting in the loss of ECL4 (refer to the structure of ABCC2 shown in Fig. 2). This hypothesis that ECL1 and ECL4 are involved in the binding was also supported by the report of Sato et al.12) using B. mori ABCC2 and Cry1Aa. These extra-long ECLs may intrude into the Domain II cavity of Bt Cry protein upon binding (Fig. 1, Cry3Aa red circle).
We still need more data to assure that the whole insect assay can be replaced with the cell assay. The cell assay system with one receptor gene cloned and expressed may not represent the insect midgut cells, which are the target of Bt Cry protein. Some Bt Cry proteins, especially their Domain III may bind to the secondary receptor to have higher activity. In this case, the cell system requires two receptor genes cloned and expressed. For this reason, the cell assay cannot replace the whole insect assay altogether but may be useful as a primary screen. The cell assay is very attractive as it can process a large number of samples in high density plates such as 96-well plates or even 384-well plates. Additionally, the cytotoxicity can be measured by chemical cell viability assay using a plate reader.13) Furthermore, the cell responses can be observed after a short incubation while the whole insect assay needs days till insects show the full reactions to Bt insecticidal proteins.
When a new insecticidal protein is discovered or produced by protein optimization, the insecticidal activity is measured by bioassay against desired target insects. In the case of protein optimization, the activities among engineered proteins, particularly between the parent and engineered progeny, must be compared accurately. Since our current interest is transgenic Bt crops, the bioassay is done with an artificial diet in which insecticidal proteins are homogeneously mixed in to simulate the plant tissue. Placing protein samples on the surface of the diet is applicable to the sprayable products but does not represent the transgenic crop. The diet mixing has another advantage for assay data accuracy. Some insects, such as H. zea and D. virgifera tend to dig deep into the surface-contaminated diet to avoid high toxin concentration on the diet surface. The diet mixing method overcomes this problem.
In our diet-mixing assay, insects are allowed to feed on the diet, and the response including feeding inhibition and mortality are determined. When the insecticidal protein is administered orally, insects respond in sigmoid function as shown in Fig. 3-A. At low protein concentrations, the response increases gradually. When the dose (protein concentrations in the diet) exceeds a certain point, the response increases steeply until the dose reaches a high point from which the response flattens out toward 100%. In our assay, the activity level for each protein is expressed in EC50 (50% Effective Concentration). To find the EC50 on the sigmoid dose–response curve, several data points, preferably 6, in the sharply ascending range are required. We use EC50 rather than LC50 (50% Lethal Concentration) to determine the activity by observing not only mortality but also feeding (growth) inhibition, because the combined factors are important for Bt crops. When the insect responses are expressed by probit function,14) the dose–response becomes linear (Fig. 3-B). In this case, EC50 can be assessed with only two data points, although it is desired to have more points.
To optimize an insecticidal protein, it is essential to screen a large number of samples in a high throughput mode. For example, DNA shuffling can produce a super high diversity of variable progenies called variants as many as millions. In order to screen so many variants, special measures were developed. First, all assays were done in 96-well microtiter plates as described by Cong et al.15) In each well of the plates, 50 µL of molten artificial insect diet was mixed with 10 µL of protein samples using a 96-channel liquid handling robot, such as Agilent Bravo (Santa Clara, CA, USA) or Hamilton Microlab NIMBUS (Reno, NV, USA). The diet was made with a low temperature-gelling agarose (30–37°C gelling temperature) to prevent the diet from solidifying at the holding temperature (40–50°C) until the protein samples were mixed into the diet. The diet solidified (i.e., <37°C) soon after the mixing indicating that the sample proteins were not exposed to a high temperature and could remain active. Since the entire wells were prepared at once with a 96-channle liquid handling robot, there should be no differences in the diet preparation among all 96 wells. After the mixing, several neonate larvae, usually 3 to 5 larvae, were placed in each well with a paint brush. Later, as described in our patent application,16) a large particle flow cytometer, COPAS™ (Complex Object Parametric Analyzer and Sorter) of Union Biometrica (Holliston, MA, USA) was used to place an exact number of insect larvae or eggs in each well. After infesting larvae, the plates were sealed with air permeable plastic film, and the sealed plates were incubated at 28°C for Lepidoptera spp. or 25°C for Diabrotica spp. for 4 to 5 days. The insect response was determined by observing insect size and mortality using numerical values from 0 to 3. Score 0 means no adverse reaction; Score 1, feeding inhibition (smaller insect size than normal larvae); Score 2, some mortality and/or strong feeding inhibition; Score 3, 100% mortality among multiple larvae placed in one well. Scores from 6 wells with the same dose were used to determine the insect response for the particular dose. To determine the percent response at one dose, all scores of the 6 wells were added. The maximum 100% effective total score was 18 (Score 3×6 wells) and 0% effective was Score 0s found in all 6 wells. The percent effectiveness from this scoring system was plotted with several different doses, 2 to 6 doses, and EC50 was determined by probit analysis.
2.2. Production of Bt Cry proteins in BtDiversified insecticidal proteins produced by protein optimization such as DNA shuffling were prepared for screening in a high throughput mode using 96-deep well plates. Engineered insecticidal proteins were cloned and expressed in an appropriate host microorganism. In our early work such as shuffling Cry1Ca, we used a Bt strain that contains no insecticidal protein genes. The host Bt had been made by plasmid curing and was confirmed to have no insecticidal protein expressed before an engineered protein gene was cloned. The method of making the protein in Bt was reported by Cong et al.15) In summary, an engineered gene was cloned in an Escherichia coli-Bacillus cereus shuttle vector called pMAXY3206. The plasmid construction (i.e., cloning the engineered gene) was made in a rec-minus E. coli strain such as XL-1 Blue, and the plasmid from the rec-minus E. coli was transferred to a dam/dcm-minus E. coli strain to produce methylation-free DNA to transform Bt. This Bt cloning process is illustrated in Supplemental Fig. S2. The transformed Bt was grown in a sporulation medium until it produced spores and crystals. Bt crystals were dissolved at pH 10 with a mixture of NaOH and 2% mercaptoethanol and collected by acid precipitation. Bt Cry1-type protoxins were effectively precipitated at pH 4.4 and re-solubilized in a buffer at pH 8.
2.3. Production of Bt Cry proteins in E. coliOeda et al.17,18) reported that Bt Cry proteins expressed in E. coli were equally insecticidal as those produced in Bt. Therefore, we changed the expression host from Bt to E. coli to adapt a high throughput method for protein purification. The Bt method requires repeating centrifugations, which are expensive to automate, while the E. coli method can use an existing 96-channel liquid handling robot with limited centrifugations. For expressing Bt Cry protein genes in E. coli, the genes were cloned in the pMAL vector from New England Biolabs (Ipswich MA, USA). Bt genes were synthesized using E. coli-preferred codons. This pMAL expression system increases the solubility of Cry protein with MBP (Maltose Binding Protein) in E. coli cytoplasm. MBP-Cry was accumulated in E. coli cells as a soluble protein without crystallization while being incubated at 16°C. E. coli cells were harvested and lysed to release MBP-Cry. The pMAL vector had been modified to attach a 6X His tag to the N-terminus of MBP so that His tagged MBP-Cry could be purified by affinity chromatography using Ni-NTA agarose. Ni-NTA agarose was packed in a 96-deep well filter plate to run the chromatography on a liquid handling robot platform. Agarose-bound His-MBP-Cry was eluted with 200 mM histidine, and histidine was removed by 96-channel Sephadex G25 spin columns in 96-deep well filter plates. The procedure of this protein production and purification was disclosed in our patent application16) and illustrated in Supplemental Fig. S3.
2.4. Analysis of engineered Bt Cry proteins before screeningAfter engineered proteins were purified, the purity and concentration were analyzed by capillary electrophoresis using LabChip™ GXII Touch HT (Perkin Elmer, Waltham, MA, USA). This method of determining the protein concentration by capillary electrophoresis has advantages for high throughput processing. LabChip handles protein samples in 96-well plates and has a wide linear range to measure protein concentrations from 0 to at least 10 mg/mL. This means that the protein concentration in the final assay sample can be concentrated as high as 10 mg/mL. The high protein concentration was necessary when a parent protein with a low starting activity was optimized and assayed in insects. We have optimized a protein with a starting activity as low as EC50=1,000 ppm. In this case, the median dose in the diet should be around 1 mg/mL meaning the concentration of protein samples should be more than 6 mg/mL (diluted 6 folds when mixed with the diet). Before LabChip, we used SDS-PAGE to determine the protein concentration. Since SDS-PAGE has non-linear responses to variable protein concentrations, it requires multiple concentrations of the reference protein and the assay samples. Accurate measurement of the protein concentration of assay samples is very important as it directly affects the assay accuracy.
Bt swaps domains of insecticidal Cry proteins to gain a new host specificity and/or higher toxicity against already susceptible insect species. Figure 4 shows the domain diversity of the Cry1B family. The domain diversity of all Cry1s is shown in Supplemental Fig. S10–12. Wang et al.11) found Cry1B.868 bound to Sf-ABCB1. Cry1B.868 is a chimeric protein made of Cry1Be Domain I and II and Cry1Ca Domain III. It is not known which domain, Domain II or Domain III, of Cry1B binds to Sf-ABCB1, but it is highly likely to be Domain II for the following reasons. (1) Cry1Ba is active against coleopteran insects as well as lepidopteran insects.19) (2) Recently, it was reported that Cry1Ia’s receptor is ABCB1,20) and Domain IIs of Cry1B and Cry1I are homologous but not Domain III. (3) The receptor for Cry1Ca is not known, but it has no activity against coleopteran insects indicating it is likely that Cry1Ca Domain III in Cry1B.868 is not involved in the binding to ABCB1. (4) Coleoptera-active Cry3Aa binds to ABCB1 as its receptor.12) (5) Cry3Aa’s coleopteran specificity was maintained after its Domain III was replaced with that of Cry1Ab (eCry3.1Ab)21) indicating Cry3Aa’s Domain II determines the coleopteran specificity.
It seems Domain III further defines the insect specificity of some Cry1-type proteins such as Cry1B at the insect’s genus level. This suggests Domain III binds to the secondary receptor. Cry1Bs with a Cry1Ac-type Domain III are active to Helicoverpa spp. Cry1Ac is highly specific to H. virescens and H. zea.3) On the other hand, Cry1Cb is active against Spodoptera species such as S. exigua (beet armyworm) and S. frugiperda.22) When Domain II–III junction sequences were compared among known Cry1-type proteins (Supplemental Fig. S4), it was found that the junction sequences are conserved allowing the recombination. The homologous region extends from the C-terminus of Domain II to the N-terminus of Domain III about 50 amino acid residues. Indeed, Bt takes advantage of this homologous region to swap Domain III.
Cry1-type proteins are 135 kDa protoxins, which require the activation by proteases in the insect digestive system. A fully matured toxin does not have protoxin’s C-terminal region of about 60 kDa. This region is called protoxin and is highly conserved among all Cry1-type proteins. Taking advantage of those homologous regions, one at the junction of Domain II and Domain III and the other at the beginning of the protoxin region, Domains III of unknown Bt Cry proteins can be amplified by PCR with a few primers mixed in the reaction mixture. While we were searching for novel Domain III sequences in our Bt culture collection, we found a free floating Domain III, which was similar to Cry1Ac Domain III, and a protoxin sequence without Domain I and II23) as shown in Fig. 5. We believe this is a strong evidence that Bt is swapping domains, particularly Domain III, by homologous recombination.
Domain swapping of Bt Cry proteins by homologous recombination was done in E. coli. Bosch et al.24) cloned genes encoding Cry1Ab and Cry1Ca in one plasmid as shown in Supplemental Fig. S5. This plasmid was cut open with BamHI and NotI to force the recombination in a recA-plus E. coli. One recombinant Cry protein called H04 having Domain I and II of Cry1Ab and Domain III of Cry1Ca showed remarkably improved activity against S. exigua, which is difficult to control with sprayable Bt formulations based on HD1. Sandoz Agro (now Syngenta) recognized the utility of this Domain swapped Cry protein in 1995 and acquired the technology. The gene encoding H04 was integrated in the chromosome of Sandoz Agro’s high yield Bt strain, and field trials were conducted in several sites in the U.S. including California.25) This open field trial of a GMO (Genetically Modified Organism) material in California was a historical event as the second environmental release of a GMO material after a long moratorium caused by the controversial ice-minus Pseudomonas trail. After the recombinant Bt was sprayed, leaf samples were taken back to the laboratory and checked for protection from insect damage by artificially infesting S. exigua larvae. The result was impressive as no insect damage was seen in plants treated with the recombinant Bt as shown in Fig. 6. After the non-GMO Bt insecticide product was sprayed, the efficacy lasted only a few days as Bt was inactivated by UV and water condensation causing wet leaf surface. Since the Bt formulation with H04 had a higher starting activity than Bt without H04, the efficacy persisted at least twice as long as that of Bt without H04.
Since then, similar domain-swapped Bt Cry proteins have been developed for Bt corn confirming the utility of this technology. Syngenta’s Agrisure® Duracade™ corn contains eCry3.1Ab, which is a hybrid of Cry3Aa Domain I and II and Cry1Ab Domain III.21) This chimeric protein has an enhanced activity level against D. virgifera. Monsanto’s (now Bayer Crop Science) Yieldgard VT Pro™ corn utilizes Cry1A.105. Cry1A.105 has Cry1Ac (Cry1Ab) Domain I and II, Cry1Fa Domain III and Cry1Ac protoxin.26) Monsanto made several other chimeric proteins by replacing Domain III of Cry1Ab, Cry1Ah and Cry1Be to produce Cry1A.107 (Cry1Ab Dm1/II-Cry1Ac DmIII), Cry1A.2 (Cry1Ah DmI-Cry1Ac DmII-Cry1Ca DmIII) and Cry1B.2 (Cry1Be Dm1/II-Cry1Ka DmIII).27) The industry is trying to enhance the activity of existing Bt Cry proteins by replacing Domain III with a different one as it was done by Bosch et al.24) in E. coli in the 1990s and is being done by Bt naturally. It is interesting to see the technology of the 1990s is used in the current transgenic crops after so many years.
The DNA shuffling method for directed evolution was first published in 1994 by Stemmer.28) He founded a company called Maxygen in 1997, and I joined Maxygen in 1998 to shuffle Bt Cry protein genes. DNA shuffling has two major applications, single gene shuffling and family gene shuffling. As shown in Supplemental Fig. S6, the single gene shuffling utilizes error prone PCR amplification. The genes amplified by PCR under a high mutational condition are fragmented by endonuclease down to approx. 100 bp. Then, these fragments are allowed to anneal, and each annealed fragment is extended by PCR without primers. This annealing-extension cycle is repeated until a desired size (the target gene size) is obtained. This cycle is followed by the final PCR step called rescue reaction. The rescue reaction amplifies the shuffled genes with 5′ and 3′-end primers, which anneal to the parent gene to “rescue” the whole shuffled genes derived from the parent gene. Since fragments are assembled by 3′-end homology (see the annealing step in Fig. S6), it is likely the rescued DNA contains all fragments in the proper orientation. Supplemental Fig. S6 shows a case of the single gene shuffling in which the shuffling reduces the number of mutations. On the other hand, the shuffling can increase mutation counts by combining different mutations among variants. As examples of successful single gene shuffling cases of Cry1Ca and Cry3Aa are described in Sections 4.2 and 4.3, respectively.
Family gene shuffling starts with homologous genes. As shown in Supplemental Fig. S6, the shuffling produces a library consisting of multiple genes with a certain homology. These shuffled progeny genes are called variants. Since Bt Cry proteins are widely diversified yet somewhat homologous, it is ideal to shuffle the genes encoding Bt Cry proteins. However, the library has to be screened to find one or more variants with desired traits. Usually a desired trait is higher activity or new insect specificity. In our practice, it was difficult to create a new insect specificity, which did not exist in the parent proteins. However, we have cases of the family shuffling that produced higher activity than any of the parent proteins. One good example of family gene shuffling is Cry1B as described in Section 4.4.
4.2. Single gene shuffling of Cry1CaIn order to demonstrate the feasibility of shuffling a single gene encoding a Bt Cry protein, Cry1Ca was chosen as it has some activity against certain Spodoptera species, such as S. exigua but is not highly active to S. frugiperda.15) Since S. exigua is more sensitive to Cry1Ca than S. frugiperda, we used S. exigua to screen the shuffled proteins as it requires less protein. Furthermore, we used a protein expression and purification system in Bt as described in Section 2.2. The shuffled proteins were screened in two tiers. The first tier screening was done with one high dose to eliminate variants having a little or no activity. Then, those variants showing the activity level above a certain threshold were screened at multiple doses around EC50 of the parent Cry1Ca. This second tier screening revealed about 20 variants showing EC50 significantly lower (i.e., higher activity) than that of the parent Cry1Ca. Those variants showing higher activity levels were sequenced to identify mutations, and mutated sites were mapped on a 3D structural model as shown in Fig. 7. Interestingly, most mutations were within the same amino acid groups such as V129I, F133L and I485V suggesting no radical structural changes. Two mutations in the Domain II loop regions, D312G and R403G, reduced the size and charge of the amino acid side chains of Asp (D) and Arg (R). These mutation sites are presumably solvent exposed. It is likely that the mutations cause no significant modifications to the backbone structure. A take home message was that the insecticidal activity could be enhanced without changing the overall protein structure. Rather, it seems minor changes of amino acid characteristics make the protein more active to insects. These findings were useful to set our strategies for subsequent optimization projects, especially saturation mutagenesis (Section 5).
This shuffling project reported by Hou et al.29) was for increasing the activity of Cry3Aa against D. virgifera. The activity of Cry3Aa against D. virgifera was as low as EC50=214 ppm according to the report. However, Cry3Aa is highly active against other coleopteran species such as Leptinotarsa decemlineata (Colorado potato beetle). D. virgifera and L. decemlineata belong to the same family of Chrysomelidae. It was expected that the activity of Cry3Aa against D. virgifera could be improved. Therefore, Cry3Aa was selected as another model material for single gene DNA shuffling. As described in Section 2.3, the shuffled cry genes were cloned in pMAL and expressed as MBP fusion proteins with a 6X His tag attached to the N-terminus of MBP.
Since the MBP-Cry3Aa fusion contains the activation site at the C-terminus of the Cry3Aa leader sequence, it was assumed that the Cry toxin would be fully activated in the insect digestive fluid. In fact, when MBP-Cry3 was incubated in vitro in the midgut fluid collected from D. virgifera, MBP was digested away from Cry3Aa within seconds.29) To prepare MBP-free Cry3Aa bioassay, MBP was removed from Cry3Aa with trypsin. When MBP-Cry3Aa was fed to D. virgifera, it was found the activity was EC50=16 ppm, about 13-fold higher than Cry3Aa without MBP. A similar activity increase of MBP-Cry was seen with other beetle-active Cry proteins such as Cry8Hb.29) The result suggests that the Cry3Aa’s low activity is due to the solubility of Cry3Aa at the pH of the insect digestive fluid. Since having E. coli MBP in Bt corn raises the regulatory hurdle, it was attempted to increase the solubility of Cry3A without MBP by DNA shuffling. The screening revealed 6 shuffled proteins showing high activity without MBP. Interestingly, the activity of all of these proteins were equally active between two different preparations, one with and the other without MBP. The best clone had EC50=9 ppm with MBP, but it was 7 ppm without MBP, about 30-fold activity increase. The activity difference of 2 ppm depending on MBP was within the standard deviation of multiple assay repeats. Similarly, all other 5 highly active variants had no significant activity differences between those with MBP and without. Figure 8 shows the mutations of a variant having the highest activity. Since the X-ray structure of Cry3Aa (pdb:1dlc)5) does not have an N-terminal region of 60 amino acid residues, it is difficult to predict the structure of the MBP-Cry3Aa fusion without the structural information of this region. However, the Cry2Aa structure (1isp),7) which includes a corresponding part of the missing N-terminal region of Cry3Aa, suggests a possible structure of MBP-Cry3Aa. From the structure of Cry2Aa, it is assumed that there is an additional alpha helix (α0) tightly folding on one side of the Cry3Aa molecule. This side is shown in the structure image of Fig. 8-A. It is likely that MBP, which is linked to α0, is attached on a particular part of this side of Cry3Aa as shown with a red circle on Fig. 8-A. There were three mutations in the red circle, all were mutations from Lys (K) to Glu (E). These solvent-exposed mutations should make Cry3Aa’s pKa substantially lower than the wild-type Cry3Aa allowing a higher level of solubility in D. virgifera gut fluid just like the MBP fusion. Interestingly, there are no mutations found on the other side (Fig. 8-B) opposite to the MBP side (Fig. 8-A) nor in the receptor binding region of Domain II (Fig. 8, bottom of the structure, refer to Fig. 1, red circle).
Bt Cry1B has a diversified Domain structure as described in Section 3.1 (Fig. 4). We are interested in improving the H. zea-activity of Cry1Bs having a Cry1Ac-type Domain III. Those Cry1Bs with a Cry1Ac-type Domain III were assayed repeatedly, and it was found Cry1Bd had high activity against O. nubilalis (EC50=1.7 ppm), an important corn pest, but the EC50 against H. zea was >200 ppm. Cry1Bj showed high O. nubilalis activity (EC50=7.3 ppm) although it was not as active as Cry1Bd. Cry1Bj, however, had weak but significant H. zea activity (EC50=105 ppm), which is higher than Cry1Bd. In order to find which domain of those Cry1Bs is important for H. zea activity, we made two chimeric proteins, C1 and C2, both having the same Cry1Ac-type Domain III. C1 has Cry1Bd Domain I and Cry1Bj Domain II. C2 has Cry1Bj Domain I and Cry1Bd Domain II. Assaying H. zea activity of C1 and C2, it was found that C2 had the same activity as Cry1Bj, but C1’s activity was too low to calculate EC50 just like Cry1Bd. This result indicates the higher H. zea activity of Cry1Bj is from its Domain I. The amino acid sequence differences of Domain I between Cry1Bd and Cry1Bj are clustered in α3, α4 and α6 but not in a way to change the helix folding.
In order to improve the H. zea activity of Cry1Bj, a family gene shuffling was conducted among Cry1B matured toxins amplified by PCR using Cry1B-specific primers from our internal Bt culture collection of about 1,000 isolates and individually synthesized Cry1Ka, Cry1Bd and Cry1Bj genes. Cry1Ka is a unique Cry1-type protein having a Domain I which is similar to all Cry1B-type Domain I, although Domain II and III are different from those of Cry1B. To understand the uniqueness of Cry1Ka, domain-wise sequence comparisons are included in the supplemental data (Fig. S10–S12). Cry1Ka was included in the shuffling to introduce a diversity to Domain I of Cry1B. After shuffling, the library was screened using the activity of Cry1Bj as a reference, and a variant called B21 showing a level of H. zea activity 3.4 folds higher than that of Cry1Bj was found. The amino acid sequence of B21 was compared with the sequence of Cry1Bj. Most of B21 specific amino acids were found in Domain II, especially in β2 to β4 strands, with a few additional mutations in Domain I and Domain III (Supplemental Fig. S7). Then, the B21 sequence was introduced to Cry1Bj one amino acid residue at a time by site-directed mutagenesis and assayed against H. zea. It was found that no single amino acid mutant of Cry1Bj showed significant activity increase indicating a combination of all or a part of B21 specific amino acids are necessary for the activity increase.
In saturation mutagenesis, each amino acid residue is mutated to all other 19 amino acids. Then the individual mutants (variants) are assayed to determine the biological activity. Bt Cry mature toxin is composed of about 670 amino acid residues. It is not realistic to produce and screen the mutations on all amino acid residues as the total number of mutants would be 12,730. Even though we have developed a high throughput insect assay, it would take many months to screen this many samples. Therefore, the saturation mutagenesis was done on selected amino acid residues. The site selection was made by different criteria including prioritizing the solvent-exposed amino acid residues (i.e., surface residues). In addition, we considered our experience in finding beneficial mutations by DNA shuffling such as amino acids having large side residues. Examples are hydrophilic Arg, Lys, Asp, Asn, Glu, Gln. Mutations on these hydrophilic residues tend to modify the protein surface but not the backbone folding. In this section, “beneficial mutation” means any single amino acid mutation which has an increased activity level. Since many beneficial mutants were found by selecting the solvent exposed residues for saturation mutagenesis, no internal amino acids were mutated.
5.2. Producing a structural model of Cry1BjIt is important to have the three dimensional (3D) structure to select sites for saturation mutagenesis. There are X-ray structures of quite a few Cry proteins, such as Cry1Aa (1cyi),6) Cry1Ac (4ary),30) Cry3Aa (pdb:1dlc),5) Cry8Ea (3eb7)31) and more. When the primary amino acid sequences of those Cry proteins were compared with that of Cry1Bj, Cry1Bj Domain III was 95% similar to Cry1Ac Domain III as described in Section 3.1 (Fig. 4). However, Cry1Bj’s Domain I and II have a higher similarity to those of Cry8Ea at a 75% overall similarity. This Domain I/II homology makes sense. Generally speaking, Cry1B and Cry8, at least some of those, have coleopteran activity like Cry3 and Cry1I. Domain IIs of Cry1B, Cry1I, Cry3 bind to the same class of receptors, ABCB1 (Section 1.2). Therefore, a template structure model using Cry8Ea (3eb7) Domain I and II and Cry1Ac (4ary) Domain III was built. This model structure indicated a good quality with few stereo-chemical conflicts because of the high sequence similarity. From this template, several 3D models were generated for Cry1Bj by MODELER 9.15 of Discovery Studio® (BIOVIA Corp. San Diego, CA, USA), and a model with the lowest optimized protein energy was selected as the final model as shown in Supplemental Fig. S8-A.
5.3. A method of saturation mutagenesis used to mutate Cry1BSaturation mutagenesis was performed on Cry1B backbone sequences using QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara, CA, USA). Each mutagenesis primer contained an NNK (N = A/C/G/T and K = G/T) codon at the mutation site and 15 to 20 flanking nucleotide sequences on both sides of the mutation site. Mutagenesis reactions were performed according to the protocol provided by the manufacturer with some modifications as described below: (1) template DNA for each reaction was 20 ng; (2) only one primer was used instead of two complementary strand primers; and (3) reaction mixtures were directly used to transform electro-competent E. coli BL21(DE3) purchased from Lucigen (Middleton, WI, USA). For each mutation reaction, 96 colonies of transformed E. coli clones were picked onto LB-agar-carb100 in 96-well plates. The plasmid was isolated from each clone and sequenced to confirm the presence of intended mutation and absence of unintended mutations. The mutagenesis goal is to produce all 19 other mutations for each site. If any specific amino acid(s) was not found, the missing mutant(s) was produced by site-directed mutagenesis using a primer containing a specific codon encoding the missing mutation. From each mutation site, 19 different amino acid mutants were selected and grown in Magic Media™-carb100 (Thermo Fisher Scientific, Waltham, MA, USA), in 96-well plates for protein production and activity screening.
In an alternative strategy, 48 colonies were picked for each mutation site, and mutant proteins were purified in 96-well plates without sequencing. These purified proteins were subjected to high throughput screening to determine the activity. Those mutants showing the highest activity for each site were sequenced, and the sequence-confirmed mutations were included in the combinatorial assembly. This strategy is useful to find a beneficial mutation quickly, but activity data of all 19 mutants helps to understand the structure and function of the target protein. For example, saturation mutagenesis of a Lys site showed activity increase with only mutations to Leu and Val, but all other amino acid mutations resulted in either no increase or even loss. The assaying all sequenced mutants validated the assay result, because two mutations from Lys to Leu and Val having similar side residues were found equally capable of increasing the activity. It also indicated that this Lys site prefers large hydrophobic amino acids.
5.4. An example of saturation mutagenesis of shuffled Cry1B (B21)To increase the activity of B21, a two-stage protein engineering strategy was developed. The strategy consisted of saturation mutagenesis as the first stage followed by the combinatorial assembly as the second stage. Examining the Cry1Bj 3D structural model, B21 amino acid residues that are exposed to the solvent and unlikely to disturb the backbone folding after mutation were selected, and saturation mutagenesis was performed on those selected residues one at a time. As shown in Supplemental Fig. S8-B, amino acid residues selected for overall mutagenesis covered a significant (>50%) portion of the surface of the molecule. After 19 mutations were made for each selected site of B21, those proteins were produced in E. coli and screened against H. zea using the high throughput screening method described in Section 2.1. From the set of 19 mutants for each site, the best mutant, i.e., the mutant exhibiting the highest Activity Index, was plotted on the primary sequence of B21 along with 2D structure assignments (Fig. 9). The Activity Index in Fig. 9 was calculated by dividing the EC50 of the parent B21 by that of the mutant. Any number above 1, e.g., 2, 3 etc., indicates a fold-increase of the activity. Activity Index 1 means that the mutant activity is the same as that of B21. Below 1, it signifies decreased activity. The bar graph of Fig. 9 revealed particular sites, where the mutation produced the highest Activity Index values at those individual sites. Those sites are α2–α3 loop and α6 of Domain I; β3–β4 loop (compare to the result of DNA shuffling in Section 4.4), β10 and Loop3 of Domain II; β18, β19 and β20 of Domain III. Besides those mutants showing Activity Index values above 3, there were numerous mutants showing the index between 2 and 3. Interestingly, those high index value mutants clustered in Domain III more than other domains (Fig. 9). This finding indicates that Domain III has a higher potential for activity improvement. Since the Cry1Ac-type Domain III of Cry1Bj makes this protein active to H. zea, this finding of Domain III mutations is interesting. We call those sites, where the mutation made the activity increased substantially (i.e., Activity Index >2) “hot spots.” These hot spots will be the priority sites for our future optimization of other Cry proteins having similar 3D structures.
Screening the mutants produced by saturation mutagenesis revealed beneficial mutants. Those mutants were combined to further increase the activity. The process is called combinatorial assembly as shown in Supplemental Fig. S9. In our project, the assembly was done progressively, because this strategy creates fewer samples to bioassay. First, one mutant showing the highest activity was selected and combined with the remaining mutants one by one to select the best combination. The best combination became the next step backbone to combine the third-level mutants. The process continued until the desired activity level was obtained. By repeating the combinatorial assembly, we identified a number of variants showing high H. zea activity (Fig. 10). One of those called B64 was 30 fold higher activity than the parent Cry1Bj. B64 was cloned and expressed in corn and challenged with H. zea larvae. The transgenic corn showed remarkable protection on both leaf disks and ears from H. zea as shown in Fig. 10.
The optimization method developed for improving the activity of Bt Cry1B against H. zea was applied to a D. virgifera-active, non-Bt protein without conducting the initial shuffling step. This protein was found in a gram-negative bacterium, Photobacterium piscinae.32) The optimization produced several variants showing over 30 fold activity improvements33) confirming the feasibility of the saturation mutagenesis method with not only a Bt Cry protein but also a non-Bt insecticidal protein.
In this article, three major protein engineering methods used to optimize Bt Cry proproteins were reviewed. They are domain swapping, DNA shuffling and saturation mutagenesis. The latter two methods require high throughput screening. It was a major undertaking to develop high throughput methods of preparing a large number of protein samples and screening those by insect bioassay, but was proven to be useful for optimizing insecticidal proteins. The first case of commercial applications of engineered Bt Cry proteins in Bt-crops was from domain swapping. There had been a concern that the modification of Bt Cry proteins by protein engineering might produce a protein having unexpected traits such as toxicity against non-targets. Since the engineered Bt Cry proteins had to go through the stringent government regulatory process such as testing the digestibility in simulated gut fluid down to small, non-allergenic peptides and toxicity against non-target insects and higher animals, the safety was assured. However, it took over 10 years to obtain the government approval. No commercial GMO crops using Bt Cry proteins optimized by DNA shuffling and saturation mutagenesis have been materialized, because they are relatively new. The saturation mutagenesis technology produced 30-fold activity increase over the parent protein, and the optimized Bt Cry1B protein showed outstanding protection of corn from H. zea. Nevertheless, it will take many years until a transgenic corn using the Cry proteins optimized by saturation mutagenesis obtains the full government approval.
Dr. Hideo Ohkawa, Professor Emeritus of Kobe University, made arrangements for publishing this review article to Journal of Pesticide Science. The author appreciates his support, encouragement and reviewing the manuscript. This article is produced as a business practice of Bacillus Tech LLC, a consulting company, and does not necessarily represent the opinions of author’s previous employers. Although all information in this article was from entirely published materials, some of those may be patented. The author acknowledges the tremendous contributions made by his coworkers and supervisors, especially those core members originally from Maxygen, such as Ruth Cong, Dave Cerf, Drs. Gusui Wu, Pill Patten and Mike Lassner to name a few. Drs. J.T. Hou and Michi Willcoxon joined our group later as project leaders of bioinformatics/assay automation and Cry1B optimization, respectively. Additionally, the author acknowledges external collaborators, especially Dr. Dirk Bosch and his colleagues at Wageningen Plant Research, Wageningen University in the Netherlands. Finally, the author would like to thank Rena Yamamoto for critically reading the manuscript.
The online version of this article contains supplementary material (Supplemental Fig. S1–S12), which is available at https://www.jstage.jst.go.jp/browse/jpestics/.