Commonalities and Differences among Symbiosis Islands of Three Mesorhizobium loti Strains

To shed light on the breadth of the host range of Mesorhizobium loti strain NZP2037, we determined the sequence of the NZP2037 symbiosis island and compared it with those of strain MAFF303099 and R7A islands. The determined 533 kb sequence of NZP2037 symbiosis island, on which 504 genes were predicted, implied its integration into a phenylalanine-tRNA gene and subsequent genome rearrangement. Comparative analysis revealed that the core regions of the three symbiosis islands consisted of 165 genes. We also identified several NZP2037-specific genes with putative functions in nodulation-related events, suggesting that these genes contribute to broaden the host range of NZP2037.

the complete matching of the end-sequence on the seed sequence, and then confirmed by PCR using primer sets designed on the end region of the seed sequence.
DNA sequencing and data assembly. The nucleotide sequence of each BAC insert was determined according to the bridging shotgun method described previously (Sato et al. 1997). Briefly, the BAC DNAs were subjected to sonication followed by size-fractionation on agarose gel electrophoresis. Fractions of approximately 3.0 kb were cloned into pUC118. The plasmid DNA was amplified by TempliPhi (GE Healthcare, UK), and used as a template. Sequencing was performed using the cycle sequencing kits (Dye-terminator Cycle Sequencing kit of Applied Biosystems, USA) with DNA sequences type 3730 (Applied Biosystems, USA) according to the protocol recommended by the manufacturer. The both ends sequences, a total of which correspond to about 6 times equivalent of an insert, were assembled using Phred-Phrap programs (Phil Green, Univ. Washington, Seattle, USA). After extension of the termini of each contig by primer extension method followed by re-connection, the BAC inserts were assembled into a single contig with more than 95% coverage of either both strands or multiple reads on one strand. A lower threshold of acceptability for generation of consensus sequences was set at Phred score 20 for each base.

<Gene assignment, annotation and comparative analysis>
Gene assignment and annotation. Prediction of protein-coding regions was carried out by a combination of four prediction programs: Glimmer 3.02 (Arthur et al. 2007), IMC (in silico Molecular Cloning -In Silico Biology, Inc.), MGA (MetaGeneAnnotator) (Noguchi et al. 2008) and the EMBOSS getorf program (http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html). All of the protein-coding regions, with 120 bp or longer in length, were translated into amino acid sequences. The putative protein-encoding genes start with ATG, GTG or TTG codons.
The all predicted genes were denoted following their ordering by a serial number with the prefix 'mln'. The putative protein-encoding genes were subjected to subsequent similarity searches against the nonredundant (nr) protein database from NCBI using the BLASTP program. Assignment of Clusters of Orthologous Groups of proteins (COGs) of predicted gene products was carried out by BLASTP analysis against the COG reference dataset (http://www.ncbi.nlm.nih.gov/COG/). A BLAST E-value of less than 10 -4 was considered significant. After filtering, COG assignments of the putative gene products were generated according to COG identification, using the best-hit pair in the reference dataset.
Comparative analysis. Comparison of translated amino acid sequences of the assigned protein-encoding genes in three M. loti strains was performed by BLASTP program.
The reciprocal BLAST best hit with the threshold of amino acid sequence identity ≥ 70%, the threshold of length coverage of the query sequence ≥ 80%, and a cut-off E-value ≤ 10 -4 were considered as conserved genes. scarified, surface-sterilized by immersion in concentrated sulfuric acid for 3 min, rinsed 10 times with sterile water, and germinated on 0.7% (w/v) agar plates at 24°C in the dark. After 2 to 3 days, seedlings were transferred to either agar slants made with B&D nitrogen-free medium and 0.9% agar (Broughton and Dilworth, 1971) or a plant box (CUL-JAR300; Iwaki, Tokyo, Japan) containing sterile vermiculite watered with B&D nitrogen-free medium. Inoculation of M. loti strains and plant cultivation were performed as described previously (Okazaki et al. 2010). ): percentage of Total hits number *One gene was newly predicted between mlr6398 and mlr6400. The function of this gene is predicted to be the conjugal transfer protein (TrbD).
Nitrogen fixation genes * Indicates the presence of a putative nod-box as shown in Table S3. $ Indicates the presence of a putative NifA-binding site        S5. Alignment of the C-terminal 50 amino acid sequences of three NZP2037 proteins in the T4SS region: mln450, mln452, and mln454. Arginine (R) residues in the amino acid sequences are indicated in red. Underlines indicate the consensus motif of the T4SS effector protein; in "R-X(7)-R-X-R-X-R-X-X(n)", R is R-Arginine, X is another amino acid, and the number in parentheses is the number of repetitions (30).