Fine mapping of Rf5 region for a sorghum fertility restorer gene and microsynteny analysis across grass species

Cytoplasmic male sterility (CMS) is widely used to control pollination in the production of commercial F1 hybrid seed in sorghum. So far, 6 major fertility restorer genes, Rf1 to Rf6, have been reported in sorghum. Here, we fine-mapped the Rf5 locus on sorghum chromosome 5 using descendant populations of a ‘Nakei MS-3A’ × ‘JN43’ cross. The Rf5 locus was narrowed to a 140-kb region in BTx623 genome (161-kb in JN43) with 16 predicted genes, including 6 homologous to the rice fertility restorer Rf1 (PPR.1 to PPR.6). These 6 homologs have tandem pentatricopeptide repeat (PPR) motifs. Many Rf genes encode PPR proteins, which bind RNA transcripts and modulate gene expression at the RNA level. No PPR genes were detected at the Rf5 locus on the corresponding homologous chromosome of rice, foxtail millet, or maize, so this gene cluster may have originated by chromosome translocation and duplication after the divergence of sorghum from these species. Comparison of the sequences of these genes between fertile and CMS lines identified PPR.4 as the most plausible candidate gene for Rf5.


Introduction
Cytoplasmic male sterility (CMS) is widely used to control pollination in the production of commercial F 1 hybrid seed, including that of sorghum (Sorghum bicolor (L.) Moench). CMS in flowering plants is characterized by a maternally inherited inability to produce functional pollen (Hanson and Bentolila 2004). It is often caused by abnormal transcripts originating from the mitochondrial genome that usually encode chimeric open reading frames (ORFs) containing part of a functional mitochondrial gene (Hanson and Bentolila 2004). The chimeric transcripts may limit the energy supply in mitochondria for pollen formation and result in sterile pollen (Schnable and Wise 1998).
The CMS phenotype is often restored by a fertility restorer (Rf) gene through the action of nuclear-derived RNA-binding proteins. In many cases, members of the large family of pentatricopeptide repeat (PPR) proteins (Barkan et al. 2012) are encoded by Rf genes (Dahan and Mireau 2013). Lines with Rf genes that restore CMS in hybrid cultivars are used as pollen donors. The PPR protein targets are primarily abnormal transcripts originating from the mitochondrial genome (reviewed by Small 2014, Manna 2015).
PPR proteins can bind RNA transcripts in a sequencespecific modular fashion. Each PPR forms two α-helices, and the series of helix-turn-helix motifs throughout the protein are stacked together to form a superhelix with an RNAbinding groove (Ban et al. 2013, Gully et al. 2015. The code of recognition between specific amino acids within the PPRs and the target RNA sequence was reported (Barkan et al. 2012, Takenaka et al. 2013, Yagi et al. 2013. The recognition of transcripts is mediated by interactions between the target RNA sequence and the amino acids at positions 4 and 34 of each PPR motif (Barkan et al. 2012, Yagi et al. 2013. The PPR family in plants has two major subfamilies, called P and PLS. P-subfamily proteins contain a series of 35-amino-acid (aa) PPR motifs (P) and normally lack additional domains (Schmitz-Linneweber and Small 2008). Most of the functionally characterized Rf genes are placed in the P subfamily, the members of which induce the cleavage of sterility-associated mitochondrial RNAs in plant mitochondria (Dahan and Mireau 2013). PLS-subfamily proteins contain characteristic triplets of P (35 amino acids), L (long, 35-36 amino acids), and S (short, ~31 amino acids) motifs (Lurin et al. 2004). PLS proteins almost always possess C-terminal domains with "E" or "DYW" motifs, and members are thought to function mainly in RNA editing (Lurin et al. 2004).
CMS in sorghum was discovered through the interaction of "milo" (A1) cytoplasm with the "kafir" nuclear background (Stephens and Holland 1954). So far, six major fertility restorer genes, Rf1 to Rf6, have been reported in sorghum. High-resolution genetic and physical mapping showed that Rf1, on chromosome Chr. 8, encodes a PLSsubfamily PPR protein (Sobic.008G147400 = PPR13) and can restore pollen fertility in A1 cytoplasm (Klein et al. 2005). PPR13 contains 14 PPRs as well as a C-terminal E motif (Lurin et al. 2004). Comparison of the coding sequences of fertile and sterile plants revealed a nonsynonymous substitution (L26 to S27) adjacent to an amino acid insertion (R26) in the protein of sterile plants. A series of single nucleotide polymorphisms and a small insertion/ deletion (indel) occur immediately 5ʹ of the PPR13 transcript. The Rf2 locus was limited to a 236-kbp region of Chr. 2 with 31 predicted ORFs, including a P-subfamily PPR gene with high sequence similarity to rice Rf1 (Jordan et al. 2010). The Rf2 locus was fine-mapped to a 10.32-kb region on Chr. 2 with only one candidate PPR gene, Sobic.002G057050, by newly developed simple sequence repeat (SSR) markers (Praveen et al. 2018), and a potential causative mutation for Rf2 was identified (Kante et al. 2018). Rf2 can restore pollen fertility in A1 cytoplasm. Rf3 and Rf4, on Chr. 7, restore CMS caused by A3 cytoplasm (Pring et al. 1999, Tang et al. 1998, Tang and Pring 2003. A3 cytoplasm has a chimeric ORF107 resulting from recombination/duplication with atp9 in mitochondrial DNA. Rf3 induces nucleolytic cleavage of ORF107 transcripts. Both Rf3 and Rf4 can restore pollen fertility by up to 50% (Kuhlman et al. 2006). Rf5, on Chr. 5 (Jordan et al. 2011), can restore pollen fertility in A1 and A2 cytoplasms. Linkage analyses delimited the Rf5 locus to a ~584-kb region on Chr. 5 that is predicted to encode 70 genes. Genome informatic analysis identified seven P-subfamily PPR family members in the genomic regions of Rf5 as candidates of the causal Rf5 gene (Jordan et al. 2011). It was postulated that multiple PPRs at this locus might correspond to active restorer genes. Rf6, on Chr. 4, can restore pollen fertility in A1 and A2 cytoplasms (Praveen et al. 2015). Linkage analysis delimited the Rf6 locus to a 43-kb region, where one PPR gene, Sobic.004G004100, occurs among the six predicted genes. Analysis of peptide sequences indicated a frame-shift mutation with a stop codon (TGA) at 286 bp caused by an 11-bp insertion in the sequence of CMS line A127 (Praveen et al. 2015).
F 1 hybrid breeding for sorghum in Japan started in the 1960s. CMS lines introduced from the USA were used as female parents and domestic Japanese lines were used as male parents. F 1 cultivars with high biomass have been bred for forage use in Japan, but no studies of their restorer genes have been reported. Recently, we reported the mode of inheritance involved in fertility restoration in seven domestic F 1 cultivars and found that Rf5 may be involved in the restoration of five of them (Kiyosawa et al. 2020). However, its known region is still large at 584 kb (Jordan et al. 2011), and fine mapping is needed to develop precise linkage DNA markers. Therefore, here we fine-mapped Rf5 using the progeny population of F 1 cultivar 'Hazuki' used in a previous study (Kiyosawa et al. 2020) and performed nucleotide polymorphism analysis and microsynteny analysis among grass crops to obtain detailed information on Rf5.

Marker development and fine mapping of Rf5 region
We used 410 F 4 plants and 188 F 6 plants for genetic mapping. We used SSR markers (Yonemaru et al. 2009) and indel markers for fine mapping of the Rf5 region (Supplemental Table 1). Genomic DNA extraction, PCR amplification, and marker genotyping were the same as in our previous study (Kiyosawa et al. 2020, Takai et al. 2012, Yonemaru et al. 2015.

BAC sequence analysis of Rf5 region
Bacterial artificial chromosome (BAC) libraries were constructed from young leaves of the CMS maintenance line 'Nakei MS-3B' (39 267 clones with an average insert size of 134 kb) and the restorer line 'JN43' (30 811 clones, 125 kb). The libraries were prepared by conventional methods, comprising a partial DNA digest with Hind III, size fractionation of high-molecular-weight DNA by pulsedfield gel electrophoresis (CHEF, Bio-Rad Laboratories, Hercules, CA, USA), vector ligation (pIndigo BAC-5, Epicentre Biotechnologies, Madison, WI, USA), and transformation into E. coli strain DH10B. Positive BAC clones covering the Rf5 region were screened from each library using tightly linked DNA markers through PCR amplification, and the identified BACs were shotgun-sequenced (Sasaki et al. 2002, Wu et al. 2003 to provide approximately tenfold sequence coverage. PCR analysis using PCR markers MS3B_Chr05_2420485 and JN43_Chr05_2589446 (Supplemental Table 1) identified three BAC clones containing inserts from the Rf5 region: NaMSB-0088G09 from 'Nakei MS-3B' and JN43-0015A16 and JN43-0018F16 from 'JN43'. BAC sequences were deposited in the DDBJ (Acc. Nos. LC494266 and LC494267).
The reference genome sequence used was Sbicolor_ v3.0.1_454, derived from 'BTx623' (McCormick et al. 2018). The structures of the PPR genes were manually corrected in consideration of frameshifts and splicing junctions. As a result, the structures of six PPR genes of 'BTx623' used here (PPR.1_B, PPR.2_B, PPR.3_B, PPR.4_B, PPR.5_B, and PPR.6_B) differed from the structures annotated in Sbicolor_v3.1.1_454. The positions and both sets of annotations of the six PPR genes in the Rf5 region are shown in Supplemental Tables 2 and 3. The mitochondrial genome sequence was derived from 'BTx623' (Acc. No. DQ 984518).

Next-generation DNA sequencing of restorer and CMS lines
To elucidate the relationship between Rf5 function and sequences, we re-sequenced five restorer lines-'JN43', 'JN290', 'SDS7444', 'Chohin237.Daikoukaku', and 'JN503' -and four CMS lines-'AMP-21', 'Nakei MS-3A', '(954149)A', and 'MS175 (932233)A'-used in the five parental combinations to detect a QTL in the Rf5 region in the F 2 populations (Kiyosawa et al. 2020) by short-read DNB-Seq technology. These data were deposited in the DDBJ Sequence Read Archive under accession number DRA012197. The informatics for the analysis of resequencing data was the same as reported previously (Kiyosawa et al. 2020) except for reference genome sequence. To obtain the correct alignments in the region rich in structural variation, we merged two sequence datasets aligned by using two sets of reference sequences, Sbicolor_v3.0.1_454 and Sbicolor_v3.0.1_454, with the Rf5 region replaced by that of 'JN43'.
In each Rf5 region, there were 6 PPR genes predicted in 'BTx623' and 'JN43' and 5 in 'Nakei MS-3B', which had a chimeric PPR gene (PPR.3+4) caused by the fusion of the adjacent genes PPR.3 and PPR.4 (Fig. 1). With this exception, a dot plot with genomic sequences (Supplemental Fig. 2) supported no structural indels, translocations, or inversions involving the PPR genes; that is, the PPR genes were conserved among the 3 cultivars.

PPR.4 of 'JN43' is a candidate allele of functional Rf5
The PPR genes of these 3 cultivars had sequence diversity due to indels, fusions, frame shifts, and gain of a stop codon, and thus their protein lengths varied from 32 to 803 aa (Fig. 3). To distinguish each allele, we appended each   (Table 1). PPR.4_B had 11 aa substitutions compared with PPR.4_J in the PPR motif, among which the T622N substitution may influence the affinity to target RNA (Fig. 4a). There were 127 aa differences between PPR.4_J and PPR.3+4_N (Supplemental Fig. 3); the differences between PPR.4_B and PPR.3+4_N seem to involve a loss of function. PPR.1 had an I766M substitution between 'JN43' and 'BTx623' (Supplemental Fig. 1). As the substitution lies in the C-terminal region, which lies outside of the conserved PPR motif, we thought that it does not affect RNA recognition. PPR.3_J encoded 300 aa with only 5 PPR motifs, which suggests that it is not functional as an Rf. Functionally characterized PPR-protein genes for Rf usually contain 11-18 PPR motifs which recognize a specific RNA sequence (Dahan and Mireau 2013). The amino acid sequences of PPR.2, PPR.5, and PPR.6 were identical between 'JN43' and 'BTx623' (Table 1). These results suggest that PPR.1, PPR.2, PPR.3, PPR.5, and PPR.6 have equivalent functions in 'JN43' (restorer) and 'BTx623' (non-restorer), and thus are not candidates for the functional Rf5 gene. Thus, PPR.4 is the most plausible candidate for Rf5.

Discussion
Our previous study showed that Rf5 is the main restorer gene in Japanese CMS lines (Kiyosawa et al. 2020). By detailed mapping of the Rf5 locus, we delimited it to a 140kb region of chromosome 5 in BTx623 genome (161-kb in JN43), where six PPR genes were predicted. Since the Rf5 locus restores the fertility of A1 cytoplasm lines as well as do the Rf1 and Rf2 loci, the Rf5 gene may have high homology with Rf1 and/or Rf2.
Sorghum branched from other Poaceae about 50 million years ago (Gaut 2002). Within the Panicoideae, sorghum branched from millet about 28 million years ago and from maize about 9 million years ago. Microsynteny analysis of the Rf5 locus detected no PPR genes in rice, foxtail millet, or maize on the corresponding chromosome. These results suggest that the Rf5 region arose more recently after the branching event from maize, and the PPR genes were translocated from another chromosome and duplicated on Chr. 5.
Nine restorers-of-fertility-like (RFL) PPR genes were reported as a cluster together with other tight clusters containing RFLs on Chr. 5 in sorghum (Sykes et al. 2017). This region had 50% (9/18 genes) of the identified RFLs within 554 kb (Sykes et al. 2017), which includes the Rf5 region. This region can be considered a hotspot for recombination at higher rates than expected (Sykes et al. 2017), which led to expansion of the family through a probable 'birth-and death' process involving diversifying selection (Fujii et al. 2011, Geddy andBrown 2007). RFL genes are highly diverse among species and even among strains of the same species, showing strong signals of diversifying selection (Fujii et al. 2011, Geddy andBrown 2007). In our study, 'Nakei MS-3B' had very different PPR genes in the Rf5 region from those in the other two cultivars. The coevolution of fertility-restoring PPR genes with CMSinducing mitochondrial genes has been described as an arms race between the mitochondrial and nuclear genomes (Touzet and Budar 2004). This is similar to the coevolution of plant genes for leucine-rich-repeat-resistance proteins and rapidly evolving pathogen effectors in plant-pathogen interactions (Dahan and Mireau 2013). The difference between 'Nakei MS-3B' and the other two cultivars shows the rapid evolution of PPR genes in this region.

Diversity of PPR motif of sorghum Rf5 proteins
There were 19 PPR motifs predicted in PPR.4_J (Fig. 4a). The motifs consist of a repetitive sequence of 35 amino acids (36 in PPR2 and 38 in PPR3). In the PPR, amino acids Y3, L10, C11, G14, F23, M26, G30, and P33 are highly conserved (Fig. 4b). These features are consistent with the characteristics of the P-type PPR motifs (Yagi et al. 2013). Study of recognition between specific amino acids within the PPRs and target RNA sequences identified the amino acids at position 4 and 34 of each PPR motif as important for the recognition of transcripts, and the amino acid at position 1 restricts the accuracy of the interactions between nucleotides and the protein (Barkan et al. 2012, Takenaka et al. 2013, Yagi et al. 2013. According to the recognition codes of the P-type PPR motifs (Yagi et al. 2013), the RNA sequence recognized by the 19 PPR motifs of PPR.4_J was predicted to be 5ʹ-AUCGACAAUGAUUY UCANY-3ʹ.
The sequence of the 17th PPR motif of PPR.4 differed between 'JN43' (restorer) and the others ('BTx623' and 'Nakei MS-3B'). In PPR.4_B, the amino acid at position 4 in the 17th PPR motif was changed from T to N, and the conserved 33rd P was changed to Q. In PPR.3+4_N also, the amino acid at position 4 changed from T to N (Supplemental Figs. 1, 3). Thus, both PPR.4_B and PPR.3+4_N had altered PPR motifs, and the nucleotide recognized by these motifs may be important. Two amino acid substitutions in the 17th PPR motif (T to N in the 4th position and   Supplemental Fig. 4). These data confirm PPR.4 as the most plausible candidate gene for Rf5. The change of these PPR motifs may cause the target RNA not to be recognized and result in loss of the restoration of fertility.
In 'JN43' with a functional Rf5 allele, PPR.4 and PPR.2 are paired genes which seem to have been generated by segmental duplication (Fig. 3, Supplemental Fig. 5). PPR.2 also has 19 PPR motifs and seems intact and functional. However, 30 aa substitutions have already occurred between them (Supplemental Fig. 5). The substitutions N128D, N447D, N552D, G622D, D692N (Supplemental Fig. 5) are involved in RNA recognition and suggest these differences in RNA sequence recognition by each PPR protein.
Other PPR genes in the Rf5 region (PPR.1, PPR.3, PPR.5, and PPR.6) have deletions or point mutations making them non-functional (Fig. 3). These results show the rapid evolution of PPR genes in sorghum.

What is the mitochondrial gene responsible for CMS?
To find genes responsible for CMS, researchers have compared the whole mitochondrial genome between CMS lines and maintainer lines in maize and wheat. Differences of many indels and chimerism of ORFs have been found (Allen et al. 2007, Liu et al. 2011). However, the causal genes of CMS for Rf5 in sorghum have not been identified yet. It is important to clarify the sequences of corresponding mitochondrial genes derived from the CMS line that can be rescued by Rf5.
Three independent sequences similar to the RNA sequence recognized by the Rf5 protein were found in the mitochondrial genome sequence of 'BTx623' (468,628 bp). Two were found in the non-coding region but the other one matched part of the coding region of rps2b, which encodes mitochondrial ribosomal protein 2B. Part of the rps2b RNA sequence (5ʹ-CAAUGAUUCUCAAT-3ʹ) partially matched 5ʹ-AUCGACAAUGAUUYUCANY-3ʹ, which the PPR.4_J protein recognizes. We consider the mitochondrial rps2b gene as a candidate gene for CMS. The Rf5 protein (PPR.4_J) may bind to rps2b mRNA, stabilizing it. Interestingly, a maize PPR protein, EMP4 (empty pericarp 4), is necessary to regulate the correct expression of mitochondrial rps2b for seed development and plant growth (Gutierrez-Marcos et al. 2007). This is an example in which the stabilization of the mRNA of a mitochondrial gene by a PPR protein is required for plant development. Since the mitochondrial genome sequence from sorghum CMS lines has not been obtained yet, we used the whole sorghum mitochondrial sequence from 'BTx623'. 'ATx623' (A line corresponding to 'BTx623') may have an indel or chimerization of one or more mitochondrial genes. Detailed comparison of the mitochondrial sequence of an A1 cytoplasmic male sterile line, 'ATx623' and its maintainer line 'BTx623' may provide information on candidate genes for CMS. Interestingly, Rf5 restores not only A1 cytoplasmic sterility but also A2 cytoplasmic sterility, and finding mitochondria sequences that interact with the Rf5 protein (PPR.4_J) and determining their relationship will provide important information for F 1 breeding.

Author Contribution Statement
AK and JY designed the experiments. AK, JY, HKN, JW, HKW, and KG carried out the experiments. AK, JY, HKW, and HM analyzed the data. JY, HKW, and HM wrote the paper. All authors reviewed and approved the final manuscript.

Acknowledgments
We are indebted to sorghum breeders for the experimental lines and the staff who performed the field work at the Nagano Animal Industry Experiment Station. We gratefully acknowledge Yukari Shimazu and Emi Abe for genotyping mapping populations. This work was supported by grants