2016 Volume 5 Issue 1 Pages A0049
We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus.
Aspergillus is a diverse genus of very common fungi that have high economic and social impact.1) Some strains are used industrially for microbial fermentation and production of organic compounds and enzymes. Several Aspergillus species are also known to be causative agents for mycoses, which has been shown to cause aspergilloses, including allergic bronchopulmonary aspergillosis, aspergilloma, and invasive aspergillosis.2) Because susceptibilities to antifungal agents vary according to Aspergillus species, accurate identification of unknown Aspergillus clinical isolates is the key to selecting an appropriate antifungal agent.
Identification of Aspergillus species has been traditionally performed based on the morphology of the conidia and conidiogeneses.1,2) However, morphological discrimination is subjective and requires special skills and experience. This has led to the increasing use of DNA-based characterizations to determine Aspergillus species. Identification of Aspergillus species has been reported using the internal transcribed spacer (ITS) region between the 18S, 5.8S, and 28S ribosomal RNA (rRNA) genes,3) the D1/D2 region of the 28S rRNA gene4) and the housekeeping genes such as β-tubulin5) and calmodulin6) genes.
On the other hand, we have proposed a ribosomal protein based MALDI-TOF MS method for bacteria characterization.7–14) Our method can identify the species of a bacteria based on the profiles of its ribosomal subunit proteins (RSPs), which are highly abundant house-keeping proteins and easily observed by MALDI-TOF MS. The results of identification at species level and discrimination at strain level are correlated with the molecular evolution of these housekeeping proteins. Prokaryotic (bacterial) ribosomal proteins consist of more than 50 subunits, so equivalent results as analyzing many genes are obtained by using RSPs as biomarkers. The key of the RSP based method is the reliability of the reference mass list of RSP biomarkers. The preparation of the reference mass list of RSP biomarkers is supported by bioinformatics. The theoretical mass of RSP biomarkers can be calculated from their amino acid sequences registered in the public protein databases such as the National Center for Biotechnology Information (NCBI) database and UniProt Knowledgebase (UniProtKB). Therefore, this method has a potential for universal use, since it is not circumscribed by commercial databases.
To extend this ribosomal protein based method to the identification of eukaryotic Aspergillus species, we have first attempted to characterize RSPs of various genome-sequenced Aspergillus strains by MALDI-TOF MS. However, most RSPs in every strains were hard to be assigned. Here, we have found that the difficulty is mainly caused by two problems in the public protein databases.
The first problem is originated from the confusion of the nomenclature in fungi. Prokaryotic (bacterial) ribosomes consist of 57 kinds of RSPs, whereas eukaryotic ribosomes typically consist of 78 RSPs. The difference of numbers induces disagreements in the names of RSPs. So far, the nomenclature are proposed based on Escherichia coli in prokaryotes, while the two nomenclatures are proposed based on yeast and rats in eukaryotes. Various names based on the different nomenclatures are muddled now. Therefore, it is difficult to search information from databases and references based on RSPs’ names. Although a unified naming system for RSPs has also been proposed,15) this proposal is not employed in the public protein databases at this time.
The second problem is that many amino acid sequences on databases seem to be incorrect. Different from prokaryotes genes, the genes of eukaryotes including Aspergillus fungi have intron sequences. We have performed the homology analysis of RSPs of Aspergillus species, and found that there were low homology parts in amino acid sequences. Because the house-keeping ribosomal proteins should be highly conserved, we have speculated that the intron sequences may be mis-annotated. Therefore, the sequence correction of RSPs would be accomplished by combining in silico inspection by sequence homology analysis and the verification of expressed mass of RSPs by MALDI-TOF MS measurements.
In this paper, we have described the detailed procedures concerning the verification and correction of information of RSPs (i.e., protein names, intron sequences, amino acid sequences, and post-translational modifications) using two genome-sequenced strains of A. fumigatus as a model.
The genome-sequenced strains of A. fumigatus Af293 (=IFM 54229) and A1163 (=IFM 53842), the neotype strain IFM 57323NT, and a clinical isolate of IFM 62104 were provided by Chiba University’s Medical Mycology Research Center. The genome-sequenced strains and IFM 57323NT were grown in potato dextrose broth (PDB) medium at 25°C for three days. The IFM 62104 strain was grown in PDB medium at 37°C for four days.
After incubation, the growing medium was centrifuged at 5,800 g for 10 min. Fungus bodies were harvested by centrifugation, and ground (twice, for 20 s each time, at 7,000 rpm) between zirconia silica beads (ca. 1,300 mg, 0.1 mm in diameter) in a MagNA Lyser (Roche). After removing the beads and cell debris by centrifugation, the fungus lysates were subjected to ultra-centrifugation at 73,400 g for 1 h to isolate the ribosome fraction as precipitates. The resulting ribosome fraction was solubilized in 20–50 μL 50% acetonitrile containing 1% trifluoroacetic acid (TFA), and then subjected to MALDI-TOF MS measurements.
MALDI-TOF MS measurementsSample preparation, apparatus, and MALDI-TOF MS data acquisition methods were similar to those described in our previous papers.7–14) The ribosomal protein sample solution (approx. 1 μL) was spotted onto the MALDI target. Approx. 1 μL sinapinic acid matrix solution at a concentration of 20 mg/mL in 50% acetonitrile with 1% trifluoroacetic acid was then overlaid and dried in air. The MALDI-TOF MS measurements were performed using an AXIMA CFR-plus time-of-flight mass spectrometer (Shimadzu/Kratos, Kyoto, Japan) in positive linear mode. More than three mass spectra for each sample were collected from more than three sample spots. External mass calibration was carried out using three peaks of ACTH (human, 1–24) ([M+H]+, m/z 2932.6) and myoglobin ([M+H]+, m/z 16952.6 and [M+2H]2+, m/z 8476.8) as references.
Calculation of the theoretical mass of RSPsThe amino acid sequence of each RSP was obtained from the UniProtKB (http://www.uniprot.org/). The sequence mass of each RSP was predicted using a Compute pI/Mw tool on the ExPASy proteomics server (http://www.expasy.org/tools/pi_tool.html), with N-terminal methionine loss considered first as a possible post-translational modification. The possibilities of other modifications will be discussed below in Results and Discussion section. The theoretical mass of each expressed RSP was calculated as [M+H]+ ion.
The nomenclature of RSPs is in a state of confusion. Names are typically composed of an alphabetical letter (L for large subunit proteins and S for small subunit proteins) and a digit, in which the numbering rule is different for each species. The first nomenclature of RSPs was proposed for bacterial (Escherichia coli) RSPs in 1971.16) For eukaryotic RSPs, mammalian (rat) RSPs were the first to be characterized and named,17) and the proposal for the yeast (Saccharomyces cerevisiae) RSP naming system18) was followed. To solve the nomenclatural confusion, a unified naming system for RSPs has been discussed, in which homologous RSPs are assigned with the same name, independent of organism species. The first proposal was based on a protein family,19) and it was further modified to a new system for naming RSPs proposed in 2014.15) Unfortunately, the new unified naming system15) is not employed in the public protein databases at this time. This paper therefore provisionally adopts the yeast name system18) for convenience of homology search, since Aspergillus and Saccharomyces are related organisms.
To unify the name of each A. fumigatus RSP into the yeast name, a homology search of A. fumigatus RSPs was performed using the NCBI blastp program (http://blast.ncbi.nlm.nih.gov/) to seek the RSPs of S. cerevisiae. Table 1 summarizes the data on A. fumigatus RSPs, such as the accession number and registered name in UniProtKB, the name using the yeast name system, and the name employing the unified naming system as a reference for the future. Most of the RSPs of A. fumigatus registered in UniProtKB were named using the yeast name system. The remaining RSPs, named using another naming system, were renamed to the yeast name in to the following manner. For example, L37a of A. fumigatus Af293 registered in UniProtKB as Q4WZH8, showed high homology with S. cerevisiae L43A (where A means one of the duplicate genes). Because L37a is based on the mammalian ribosome name, it is renamed to L43 in line with the yeast name (incidentally, it corresponds to eL43 in the unified name15)). This L43 protein showed more than 95% similarity to L43 of A. clavatus NRRL1, A. terreus NIH2624, and A. niger CBS513.88. These homologs of another Aspergillus species are registered using the yeast name. To prevent such confusion, all RSPs of A. fumigatus Af293 and A1163 were unified to the yeast name.
Protein name | A. fumigatus Af293 | A. fumigatus A1163 | |||
---|---|---|---|---|---|
Yeast name | Unified name | Designation in UniProt | Accession No. in UniProt | Designation in UniProt | Accession No. in UniProt |
Large subunit proteins | |||||
L1 | uL1 | Ribosomal protein | E9QU85 | Ribosomal protein | B0XQU0 |
L2 | uL2 | L8, putative | Q4WTW7 | L8, putative | B0Y3E2 |
L3 | uL3 | L3 | Q8NKF4 | L3 | B0XSL2 |
L4 | uL4 | L4, putative | Q4WEH4 | L4 | B0Y2P9 |
L5 | eL18 | L5, putative | Q4WSG1 | L5 | B0XR75 |
L6 | eL6 | L6 | Q4WSZ2 | L6 | B0XQN2 |
L7 | uL30 | L7 | Q4W9S6 | L7 | B0YEG9 |
L8 | eL8 | L7A | Q4WLM5 | L7A | B0XM24 |
L9 | uL6 | L9, putative | Q4WTJ3 | L9, putative | B0XQ32 |
L10 | uL16 | L10 | Q4X1P8 | L10 | B0XRW7 |
L11 | uL5 | L11 | Q4WP20 | L11 | B0Y5W6 |
L12 | uL11 | L12 | Q4WK81 | L12 | B0XMZ1 |
L13 | eL13 | L13 | Q4W9L9 | L13 | B0YEB1 |
L14 | eL14 | L14 | Q4WD82 | L14 | B0YD67 |
L15 | eL15 | L15 | Q4WJV5 | L15 | B0XNP4 |
L16 | uL13 | L16a | Q4WJH1 | L16a | B0XPG3 |
L17 | uL22 | L17 | Q6MY48 | L17 | B0XMS0 |
L18 | eL18 | L18 | Q4X279 | L18 | B0XW36 |
L19 | eL19 | L19 | Q4X220 | L19 | B0XW91 |
L20 | eL20 | L20 | Q4WJW9 | L20 | B0XNN1 |
L21 | eL21 | L21, putative | Q4WWT1 | L21, putative | B0XYU3 |
L22 | eL22 | L22, putative | Q4WYA0 | L22, putative | B0XWY6 |
L23 | uL14 | Alkaline serine protease | Q4WI20 | Alkaline serine protease | B0XUE5 |
L24 | eL24 | L24a | Q4WCU3 | L24a | B0YDK4 |
L25 | uL23 | L23 | Q4WTP5 | L23 | B0Y372 |
L26 | uL24 | L26 | Q4WM42 | L26 | B0Y8G7 |
L27 | eL27 | L27 | Q4WJD7 | L27e | B0XPJ7 |
L28 | uL15 | L27a, putative | Q4WWF0 | L27a, putative | B0XZ73 |
L29 | eL29 | L29, putative | Q4WKA9 | L29, putative | B0XMW3 |
L30 | eL30 | L30, putative | Q4X1P9 | L30, putative | B0XRW6 |
L31 | eL31 | L31e | Q4WLK1 | L31e | B0XLZ0 |
L32 | eL32 | L32 | Q4WZN0 | L32 | B0XV02 |
L33 | eL33 | L35Ae | Q4WX73 | L35Ae | B0XYE5 |
L34 | eL34 | L34, putative | Q4WI54 | L34 protein, putative | B0XUB0 |
L35 | uL29 | L35 | Q4WT53 | L35 | B0XQH1 |
L36 | eL36 | L36 | Q4WNZ0 | L36 | B0Y5T6 |
L37 | eL37 | L37 | Q4WWR1 | L37 | B0XYW1 |
L38 | eL38 | L38, putative | Q4WP31 | Rpl38, putative | B0Y5X8 |
L39 | eL39 | — | — | — | — |
L40 | eL40 | Ubiquitin UbiA, putative | A4D9S6 | Ubiquitin UbiA, putative | B0XNB9 |
L42 | eL42 | L44 | Q4X205 | Uncharacterized protein | B0XWA6 |
L43 | eL43 | L37a | Q4WZH8 | L37a | B0XVB7 |
P0 | uL10 | P0 | Q4WJR3 | — | — |
P1 | P1/P2 | P1 | Q9HGV0 | P1 | B0XPQ5 |
P2 | P1/P2 | P2 | Q9UUZ6 | P2/allergen Asp F 8 | B0XS47 |
Small subunit proteins | |||||
S0 | uS2 | S0 | Q4WYK1 | S0 | B0XWG9 |
S1 | eS1 | S1 | Q4WTM9 | S1 | B0Y356 |
S2 | uS5 | S5 | Q4WAI8 | S5 | B0YBW2 |
S3 | uS3 | S3, putative | Q4WJK8 | S3, putative | B0XP55 |
S4 | eS4 | S4 | Q4WWR9 | S4 | B0XYV4 |
S5 | uS7 | S5, putative | Q4WRU9 | S5, putative | B0XN49 |
S6 | eS6 | S6 | Q4WPX5 | S6 | B0Y6R5 |
S7 | eS7 | S7e | Q4WXU5 | S7e | B0XXS8 |
S8 | eS8 | S8 | Q4WJZ0 | S8 | B0XNE5 |
S9 | uS4 | S9 | Q4WWT2 | S9 | B0XYU2 |
S10 | eS10 | S10b | Q4WLQ8 | S10b | B0Y8V2 |
S11 | uS17 | S11 | Q4WHU8 | S11 | B0XUT5 |
S12 | eS12 | S12 | Q4WJM1 | S12 | B0XP41 |
S13 | uS15 | S13 | Q4WGJ9 | S13 | B0YCP0 |
S14 | uS11 | S11 | Q4X1C6 | S11 | B0XS79 |
S15 | uS19 | S15, putative | Q4X1G1 | S15, putative | B0XS46 |
S16 | uS9 | Rps16, putative | Q4X1C0 | S9 | B0XS84 |
S17 | eS17 | S17, putative | Q4X1E0 | S17, putative | B0XS66 |
S18 | uS13 | S13p/S18e | Q4WLH1 | S13p/S18e | B0XM75 |
S19 | eS19 | S19 | Q4WJN7 | S19 | B0XP26 |
S20 | uS10 | S10a | Q4WIE3 | S10a | B0XTV5 |
S21 | eS21 | S21 | Q4WI01 | S21 | B0XUN2 |
S22 | uS8 | S22 | Q4WRN1 | S22 | B0XNI4 |
S23 | uS12 | S23 | Q873W8 | S23 (S12) | B0XQ66 |
S24 | eS24 | S24 | Q4WAQ6 | S24 | B0YC29 |
S25 | eS25 | S25, putative | Q4WRF2 | — | — |
S26 | eS26 | S26 | Q4WJ94 | S26 | B0XPP9 |
S27 | eS27 | S27 | Q4WWP9 | S27 | B0XYX4 |
S28 | eS28 | S28e | Q4WGB8 | S28e | B0YCF7 |
S29 | uS14 | S29, putative | Q4WLQ2 | S29, putative | B0Y8V8 |
S30 | eS30 | S30/ubiquitin fusion | Q4WCU4 | S30/ubiquitin fusion | B0YDK3 |
S31 | eS31 | Ubiquitin (UbiC), putative | Q4WXZ8 | Ubiquitin (UbiC), putative | B0XXM3 |
Ribosomal proteins L40, S30, and S31 are synthesized as fusion proteins with ubiquitin20,21) (note that S31 is assigned as S27a in ref. 20). There are several different types of ubiquitin, all of which are highly conserved and well characterized, so identification of the ubiquitin part in a fusion protein sequence is an easy task. In UniProtKB, L40 is registered as “Ubiquitin UbiA” (accession numbers: A4D9S6 for Af293 and B0XNB9 for A1163). In this fusion protein, ubiquitin forms a part of the N-terminal-side 76 amino acids, whereas L40 is the remaining part of C-terminal-side 52 amino acids.20) In the case of S31 registered as “Ubiquitin (UbiC)” (Q4WXZ8 and B0XXM3), since the N-terminal side 76 amino acid is ubiquitin, the remaining C-terminal side chain is S31. To increase the confusion, S30, which is registered as “S30/ubiquitin fusion” (Q4WCU4 and B0YDK3), is not a fusion protein, and the full length of the registered amino acid sequence corresponds to S30.
The page for alkaline serine protease in UniProtKB (Q4WI20) includes the “ribosomal protein L14P family” in the Family & Domains field. L14P is the bacterial RSP name, which corresponds to yeast L23. The amino acid sequence of this protein showed a high homology with L23 of S. cerevisiae, so the name of this protein was changed to L23. All the names of A. fumigatus RSPs were verified and changed to the yeast name using this procedure.
Observation of MALDI-TOF mass spectra and peak assignmentThe next step is the calculation of the theoretical mass of each RSP based on the corresponding amino acid sequences obtained from UniProtKB. The theoretical mass was then compared with the observed mass. Figure 1 shows the mass spectra of the ribosomal protein fraction prepared from A. fumigatus Af293 and A1163, with the peaks under m/z 20,000 assigned. Finally, we were able to assign 31 RSPs, but at this stage only eight peaks could be assigned for each strain when using the registered amino acid sequences in UniProtKB and only if taking N-terminal methionine loss into account. These peaks are indicated as the boxed protein names in Fig. 1. In our previous studies of bacterial RSPs,7–14) most could be assigned by referring to the theoretical mass calculated from the registered amino acid sequences while only considering N-terminal methionine loss. The main reasons why only eight RSPs could be assigned might be speculated as (1) many incorrect amino acid sequences are registered in the protein databases and (2) post-translational modifications occur, other than N-terminal methionine losses. The following section discusses the actual state of the registered information and how to correct erroneous sequences and speculate post-translational modifications.
Incorrectly registered amino acid sequences in bacterial RSPs were mainly caused by mis-annotation of start codons.9,12) In this study, we found that incorrect sequences of eukaryotic RSPs of A. fumigatus were caused by mis-annotation of the exon/intron structure. Accurate coding DNA sequence (CDS) was determined by a combination of informatics procedures involving a homology search and a manual inspection of the DNA sequence of the corresponding genes, followed by confirmation of the correct mass of the expressed RSPs by MALDI-TOF MS measurements. The details of the correction procedures are described below.
The amino acid sequences of RSPs tend to be highly preserved, and show high homology with other species’ proteins. However, RSPs not assigned at the beginning tended to have different sequence lengths registered in the database. For example, Fig. 2 shows the multiple alignment of S29 of A. fumigatus, for which the peak could not be observed at the calculated mass, and other Aspergillus species such as A. clavatus NRRL1, A. nidulans FGSC A4, and A. niger CBS513.88. The amino acid sequences between 1 and 54 are highly conserved between these strains, while the homology and length of C-terminal side are markedly different. Eukaryotic S29 is highly conserved from yeast to humans,22) and has 56 amino acids containing a specific zinc finger-like motif (C-x-x-C).23) Since S29 of A. niger CBS513.88 and A. nidulans FGSC A4 have the zinc finger-like motif and 56 amino acid sequences, these sequences are more likely to be right. The DNA sequence of the S29 gene (rps29) of A. fumigatus Af293 was therefore compared to that of A. niger CBS513.88.
The rps29 gene of A. niger CBS513.88 is located on c482296-481588 (708 bp) of supercontig An06 (NT_166522.1 in NCBI) and consists of 5 exons and 4 introns. The rps29 gene of A. fumigatus Af293 is located on c3211760-3211177 (583 bp) of chromosome 6 (NC_007199.1 in NCBI) and consists of 5 exons and 4 introns. Figure 3 shows the sequence alignment of these genes, with exon regions underlined. In spite of the high sequence similarity of exon-1 to exon-3, the length of exon-4 is different: it is 57 bp for A. niger CBS513.88 and 61 bp for A. fumigatus Af293. Thus, the differences of 4 bp indicated by the box in Fig. 3 seems to be a redundancy. If these 4 bp are assigned as an intron, as they are in A. niger S29, a frame shift occurs at exon-5, resulting in a shift in the stop-codon (i.e., removal of the redundant italic sequence at the 3′-side in Fig. 3). The numbers of base pairs now match, with the correct amino acid sequence being 56 aa, which is common to a wide range of eukaryotes. The correct amino acid sequence of S29 showed more than 90% similarity to that of A. clavatus and A. nidulans. The correct mass of S29 ion ([M+H]+) was calculated as 6646.7 Da, and the corresponding peak was clearly observed in the mass spectra, as shown in Fig. 1. The same procedure was performed for S29 of A. fumigatus A1163, revealing the same sequence and mass as those of the Af293 strain.
The sequence information of L39 of both Af293 and A1163 strains was not registered in the protein databases. We tried to find the open reading frame (ORF) of the L39 gene (rpl39) in the genome sequence of Af293 and A1163 strains using the rpl39 gene sequence of other Aspergillus species by manual inspection. As a result of a blast search performed using known rpl39 gene sequences, highly homologous sequences of rpl39 gene were found in chromosome 5 of A. fumigatus Af293 (NC_007198.1, c1443605-1444119) and ctg_000043 of A1163 (ABDB01000043.1, c422041-421524). An alignment analysis of the putative rpl39 gene sequences with those of several Aspergillus species gave the exon/intron structure and a total of 156 bp of CDS. The resulting amino acid sequences were the same between the Af293 and A1163 strains, and also the same as L39 of A. oryzae RIB40 and A. flavus AF70. The theoretical mass of L39 ion ([M+H]+) was determined as 6151.2 Da, and the corresponding peak was observed as shown in Fig. 1. These results strongly support the speculated sequence and expressed mass of L39 of the A. fumigatus strains.
In this manner, the verification of A. fumigatus RSPs under 20,000 Da could be performed by a combination of manual sequence inspections and MALDI-TOF MS measurements. Surprisingly, more than half (17 of 31) of the RSPs were incorrectly registered in the public protein databases, mainly due to erroneous annotations of exon/intron structures. In addition, two RSPs were registered as fusion proteins, and L39 was absent. The corrected CDS and amino acid sequences of these 17 RSPs are summarized in the supporting information Table SI-1 for A. fumigatus Af293 and Table SI-2 for A. fumigatus A1163.
The automatic annotation of exon/intron structures after whole-genome sequencing is likely to be imperfect, since the only clue to determining introns applied is the GT-AG rule (most introns start with GT and end with AG). Because accurate determination of cDNA by mRNA sequencing is both expensive and time-consuming, a full set of experimental cDNA sequence data of Aspergillus RSPs has not yet been reported. Our approach appears to be a simple and effective method of speculating accurate amino acid sequences of RSPs.
Post-translational modificationsUnidentified RSPs still remained after sequence correction, suggesting the presence of post-translational modification. In this study, post-translational modifications could be speculated for 11 RSPs, as described in this section. These modifications appear to be conserved in eukaryotes.
Acetylation, especially at the N-terminus, seems to be a common post-translational modification in eukaryotic RSPs. Nine RSPs (L31, L35, S11, S15, S16, S18, S21, S24, and S28) showed clear peaks at +42 Da over the calculated sequence mass, suggesting acetylation. For example, although the amino acid sequence of S21 is slightly different between Af293 and A1163 strains, clear peaks are seen in the +42 Da position for both samples, as shown in Fig. 4.
In yeast RSPs, when the penultimate amino acid residue is serine, N-terminal methionine loss followed by N-terminal acetylation is likely to occur.24,25) Among probably acetylated nine RSPs, L31, L35, and S18 have an MS- sequence at the N-terminal side. In yeast RSPs, S21 with ME- and S28 with MD- are acetylated.25) This information strongly suggests the acetylation of S21 and S28 of A. fumigatus strains with the same N-terminal sequences. Yeast S11, S15, S16, and S24 with MS- sequences are N-acetylated.25) However, rat S11 (in UniProtKB, P62282) and S1526) with MA- would also be N-acetylated. Therefore, S11 and S15 (and also probably S16) with MA- are likely to be N-acetylated.
Methylation is another possible post-translational modification of RSPs. In methylation of L42 at Lys-55 is evolutionally conserved among eukaryotes.27) Because sequence homology around Lys-55 is high (yeast Lys-55 corresponds to Lys-50 of A. fumigatus by similarity), methylation is likely to be a post-translational modification of L42 of A. fumigatus. A clear peak could in fact be observed around m/z 12028.3, taking account of +14 Da added to the calculated sequence mass.
Prolyl dihydroxylation of eukaryotic S23 is known as an evolutionarily conserved modification,28) and Pro-64 is hydroxylated in yeast S23. High sequence homology around Pro-64 of S23 suggests S23 of A. fumigatus strains to also be hydroxylated, resulting in a +32 Da shift. The corresponding peaks could be clearly observed around m/z 15802.5.
List of ribosomal protein biomarkers and its applicabilityIn this way, we could finally confirm the mass of 31 of 50 expressed RSPs under 20,000 Da. Most of the intense peaks observed under m/z 20,000 could be identified, as shown in Fig. 1. Unidentified RSPs are probably caused by low ionization efficiency due to the acidic properties and unclear post-translational modifications (we found more putative methylated and acetylated RSPs, but they are omitted in this paper due to a lack of supporting references). Tables 2 and 3 summarize the assigned ribosomal proteins of A. fumigatus Af293 and A1163 strains, together with calculated masses and possible post-translational modifications. Almost all identified RSPs have the same sequence and mass except for S21 with only one amino acid difference.
Protein name | Accession No. in UniProt | pI | Calculated mass as [M+H]+ | Observed mass | Sequence correction | Modificationsa | |
---|---|---|---|---|---|---|---|
Yeast name | New system | ||||||
L40 | eL40 | A4D9S6 | 9.5 | 6002.3 | 6001.7 | yes | |
L39 | eL39 | — | 12.6 | 6151.2 | 6151.4 | yes | −Met |
S29 | uS14 | Q4WLQ2 | 10.1 | 6646.7 | 6646.5 | yes | −Met |
S30 | eS30 | Q4WCU4 | 11.5 | 6789.1 | 6789.5 | −Met | |
L29 | eL29 | Q4WKA9 | 11.6 | 7456.6 | 7457.0 | −Met | |
S28 | eS28 | Q4WGB8 | 10.9 | 7710.0 | 7710.0 | yes | +Ac |
S31 | eS31 | Q4WXZ8 | 9.8 | 9134.9 | 9133.9 | yes | |
L38 | eL38 | Q4WP31 | 10.3 | 9153.8 | 9154.5 | −Met | |
L43 | eL43 | Q4WZH8 | 10.5 | 10025.8 | 10024.5 | −Met | |
S21 | eS21 | Q4WI01 | 8.5 | 10052.2 | 10053.2 | +Ac | |
L37 | eL37 | Q4WWR1 | 11.0 | 10386.9 | 10386.4 | yes | −Met |
L30 | eL30 | Q4X1P9 | 9.9 | 11171.1 | 11170.9 | −Met | |
L36 | eL36 | Q4WNZ0 | 11.9 | 11869.8 | 11869.4 | yes | −Met |
L42 | eL42 | Q4X205 | 10.5 | 12028.3 | 12028.1 | yes | −Met, +Me |
L33 | eL33 | Q4WX73 | 10.2 | 12215.1 | 12215.2 | yes | −Met |
L34 | eL34 | Q4WI54 | 9.6 | 13164.5 | 13164.4 | yes | −Met |
S26 | eS26 | Q4WJ94 | 10.9 | 13338.7 | 13337.7 | −Met | |
L31 | eL31 | Q4WLK1 | 10.5 | 13919.1 | 13918.5 | −Met, +Ac | |
L35 | uL29 | Q4WT53 | 11.1 | 14532.0 | 14533.0 | yes | −Met, +Ac |
L32 | eL32 | Q4WZN0 | 11.3 | 14836.6 | 14835.9 | −Met | |
L26 | uL24 | Q4WM42 | 10.9 | 14979.4 | 14979.1 | yes | −Met |
S24 | eS24 | Q4WAQ6 | 10.7 | 15226.6 | 15226.0 | −Met, +Ac | |
L27 | eL27 | Q4WJD7 | 10.5 | 15682.6 | 15683.5 | yes | |
S23 | uS12 | Q873W8 | 10.5 | 15802.5 | 15801.9 | −Met, +Hyd (2) | |
S16 | uS9 | Q4X1C0 | 10.2 | 15883.4 | 15881.9 | −Met, +Ac | |
S17 | eS17 | Q4X1E0 | 10.0 | 16089.5 | 16087.9 | yes | −Met |
S19 | eS19 | Q4WJN7 | 9.6 | 16351.4 | 16350.6 | yes | −Met |
L28 | uL15 | Q4WWF0 | 10.4 | 16631.1 | 16630.9 | −Met | |
S15 | uS19 | Q4X1G1 | 10.1 | 17626.5 | 17626.5 | −Met, +Ac | |
S18 | uS13 | Q4WLH1 | 10.5 | 17779.5 | 17780.5 | yes | −Met, +Ac |
S11 | uS17 | Q4WHU8 | 10.8 | 18478.6 | 18480.7 | yes | −Met, +Ac |
a −Met: N-Methionine loss, +Ac: acetylation, +Me: methylation, +Hyd: hydroxylation.
Protein name | Accession No. in UniProt | pI | Calculated mass as [M+H]+ | Observed mass | Sequence correction | Modificationsa | |
---|---|---|---|---|---|---|---|
Yeast name | New system | ||||||
L40 | eL40 | B0XNB9 | 9.5 | 6002.3 | 6001.9 | yes | |
L39 | eL39 | — | 12.6 | 6151.2 | 6151.5 | yes | −Met |
S29 | uS14 | B0Y8V8 | 10.1 | 6646.7 | 6646.0 | yes | −Met |
S30 | eS30 | B0YDK3 | 11.5 | 6789.1 | 6790.8b | −Met | |
L29 | eL29 | B0XMW3 | 11.6 | 7456.6 | 7457.5 | −Met | |
S28 | eS28 | B0YCF7 | 10.9 | 7710.0 | 7709.7 | yes | +Ac |
S31 | eS31 | B0XXM3 | 9.8 | 9134.9 | 9135.0 | yes | |
L38 | eL38 | B0Y5X8 | 10.3 | 9153.8 | 9154.8 | −Met | |
L43 | eL43 | B0XVB7 | 10.5 | 10025.8 | 10025.2b | −Met | |
S21 | eS21 | B0XUN2 | 8.5 | 10038.1 | 10037.7 | +Ac | |
L37 | eL37 | B0XYW1 | 11.0 | 10386.9 | 10385.9 | yes | −Met |
L30 | eL30 | B0XRW6 | 9.9 | 11171.1 | 11170.7 | −Met | |
L36 | eL36 | B0Y5T6 | 11.9 | 11869.8 | 11869.3 | yes | −Met |
L42 | eL42 | B0XWA6 | 10.5 | 12028.3 | 12027.9b | yes | −Met, +Me |
L33 | eL33 | B0XYE5 | 10.2 | 12215.1 | 12214.8 | yes | −Met |
L34 | eL34 | B0XUB0 | 9.6 | 13164.5 | 13166.1 | yes | −Met |
S26 | eS26 | B0XPP9 | 10.9 | 13338.7 | 13336.9 | −Met | |
L31 | eL31 | B0XLZ0 | 10.5 | 13919.1 | 13918.5 | −Met, +Ac | |
L35 | uL29 | B0XQH1 | 11.1 | 14532.0 | 14531.4 | yes | −Met, +Ac |
L32 | eL32 | B0XV02 | 11.3 | 14836.6 | 14836.3 | −Met | |
L26 | uL24 | B0Y8G7 | 10.9 | 14979.4 | 14979.3 | yes | −Met |
S24 | eS24 | B0YC29 | 10.7 | 15226.6 | 15225.9 | −Met, +Ac | |
L27 | eL27 | B0XPJ7 | 10.5 | 15682.6 | 15682.3 | yes | |
S23 | uS12 | B0XQ66 | 10.5 | 15802.5 | 15802.2 | −Met, +Hyd (2) | |
S16 | uS9 | B0XS84 | 10.2 | 15883.4 | 15884.1b | −Met, +Ac | |
S17 | eS17 | B0XS66 | 10.0 | 16089.5 | 16089.2 | yes | −Met |
S19 | eS19 | B0XP26 | 9.6 | 16351.4 | 16350.7 | yes | −Met |
L28 | uL15 | B0XZ73 | 10.4 | 16631.1 | 16630.4 | −Met | |
S15 | uS19 | B0XS46 | 10.1 | 17626.5 | 17626.0 | −Met, +Ac | |
S18 | uS13 | B0XM75 | 10.5 | 17779.5 | 17779.2 | yes | −Met, +Ac |
S11 | uS17 | B0XUT5 | 10.8 | 18478.6 | 18479.8 | yes | −Met, +Ac |
a −Met: N-Methionine loss, +Ac: acetylation, +Me: methylation, +Hyd: hydroxylation. b Shoulder peak.
To confirm the applicability of the reference mass list, RSPs of the neotype strain IFM 57323NT and a clinical isolate IFM 62104 were further characterized. Because the criteria of species identification is the similarity to the type strain, the characterization of RSPs of IFM 57323NT would be important to establish the reliable biomarker list for the identification of A. fumigatus. The characterization of the clinical isolate IFM 62104, which have been already identified as A. fumigatus, was performed as a demonstration for the analysis of real samples.
Figure 5 shows the partial mass spectra of ribosomal protein fractions obtained from (a) the Af293, (b) A1163, (c) IFM 57323NT, and (d) IFM 62104 (whole mass spectra of IFM 57323NT and IFM 62104 are shown in Figs. SI-1 and SI-2 in the supporting information). In this mass range, seven identified RSPs (S31, L38, L43, S21, L37, L30, and L36) are commonly observed. Here, of two types of S21, the peak for IFM 57323NT and IFM 62104 appeared the same as S21 of A1163. In the entire mass spectra, all 31 RSP biomarkers could be observed for the IFM 57323NT and IFM 62104 strains. These results suggest that the reference mass list can be used as a clue for the species identification of A. fumigatus.
In this study, we have investigated the actual state of RSPs in the public protein databases by characterizing the RSPs of genome-sequenced strains of A. fumigatus Af293 and A1163. As a result, we could solve the problems of the registered information of RSPs in the public protein databases.
As for the problem concerning the confusion of the nomenclature, all the RSPs’ names were verified and unified to the names based on yeast which is most prevalent in the public protein databases (also listed under the new unified naming system15)). As for the second problem originated from incorrect sequence information, we have pointed out that more than half of the A. fumigatus RSPs are incorrect mainly due to mis-annotation of exon/intron structures. Because RSPs are highly conserved, we could easily find out the candidates of the correct sequences, and verify them by comparing the theoretical mass with the observed mass. In addition, the post translational modifications such as acetylation and methylation could also be confirmed.
By solving these problems, we have successfully completed the reference mass list of two genome-sequenced strains of A. fumigatus. By using the completed sequence information of the RSPs of A. fumigatus as a reference, information on the RSPs of other related fungal strains can be more easily verified by combining in silico inspection with MALDI-TOF MS measurements. We are proceeding with the characterization of RSPs of other Aspergillus genome-sequenced strains to make reliable lists of biomarker RSPs for identification of Aspergillus species. Once the Aspergillus RSP biomarker lists have been compiled, ribosomal protein-based MALDI-TOF MS is anticipated to be a powerful and reliable tool in the field of clinical microbiology.
This work was supported in part by a research grant from the Institute for Fermentation, Osaka (IFO), JSPS Kakenhi Grant Number 25430198, and the National Bioresource Project (Pathogenic Microbes) in Japan (http://www.nbrp.jp/).