Mass Spectrometry
Online ISSN : 2186-5116
Print ISSN : 2187-137X
ISSN-L : 2186-5116
Original Article
Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification
Sayaka NakamuraHiroaki Sato Reiko TanakaTakashi Yaguchi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2016 Volume 5 Issue 1 Pages A0049

Details
Abstract

We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus.

INTRODUCTION

Aspergillus is a diverse genus of very common fungi that have high economic and social impact.1) Some strains are used industrially for microbial fermentation and production of organic compounds and enzymes. Several Aspergillus species are also known to be causative agents for mycoses, which has been shown to cause aspergilloses, including allergic bronchopulmonary aspergillosis, aspergilloma, and invasive aspergillosis.2) Because susceptibilities to antifungal agents vary according to Aspergillus species, accurate identification of unknown Aspergillus clinical isolates is the key to selecting an appropriate antifungal agent.

Identification of Aspergillus species has been traditionally performed based on the morphology of the conidia and conidiogeneses.1,2) However, morphological discrimination is subjective and requires special skills and experience. This has led to the increasing use of DNA-based characterizations to determine Aspergillus species. Identification of Aspergillus species has been reported using the internal transcribed spacer (ITS) region between the 18S, 5.8S, and 28S ribosomal RNA (rRNA) genes,3) the D1/D2 region of the 28S rRNA gene4) and the housekeeping genes such as β-tubulin5) and calmodulin6) genes.

On the other hand, we have proposed a ribosomal protein based MALDI-TOF MS method for bacteria characterization.714) Our method can identify the species of a bacteria based on the profiles of its ribosomal subunit proteins (RSPs), which are highly abundant house-keeping proteins and easily observed by MALDI-TOF MS. The results of identification at species level and discrimination at strain level are correlated with the molecular evolution of these housekeeping proteins. Prokaryotic (bacterial) ribosomal proteins consist of more than 50 subunits, so equivalent results as analyzing many genes are obtained by using RSPs as biomarkers. The key of the RSP based method is the reliability of the reference mass list of RSP biomarkers. The preparation of the reference mass list of RSP biomarkers is supported by bioinformatics. The theoretical mass of RSP biomarkers can be calculated from their amino acid sequences registered in the public protein databases such as the National Center for Biotechnology Information (NCBI) database and UniProt Knowledgebase (UniProtKB). Therefore, this method has a potential for universal use, since it is not circumscribed by commercial databases.

To extend this ribosomal protein based method to the identification of eukaryotic Aspergillus species, we have first attempted to characterize RSPs of various genome-sequenced Aspergillus strains by MALDI-TOF MS. However, most RSPs in every strains were hard to be assigned. Here, we have found that the difficulty is mainly caused by two problems in the public protein databases.

The first problem is originated from the confusion of the nomenclature in fungi. Prokaryotic (bacterial) ribosomes consist of 57 kinds of RSPs, whereas eukaryotic ribosomes typically consist of 78 RSPs. The difference of numbers induces disagreements in the names of RSPs. So far, the nomenclature are proposed based on Escherichia coli in prokaryotes, while the two nomenclatures are proposed based on yeast and rats in eukaryotes. Various names based on the different nomenclatures are muddled now. Therefore, it is difficult to search information from databases and references based on RSPs’ names. Although a unified naming system for RSPs has also been proposed,15) this proposal is not employed in the public protein databases at this time.

The second problem is that many amino acid sequences on databases seem to be incorrect. Different from prokaryotes genes, the genes of eukaryotes including Aspergillus fungi have intron sequences. We have performed the homology analysis of RSPs of Aspergillus species, and found that there were low homology parts in amino acid sequences. Because the house-keeping ribosomal proteins should be highly conserved, we have speculated that the intron sequences may be mis-annotated. Therefore, the sequence correction of RSPs would be accomplished by combining in silico inspection by sequence homology analysis and the verification of expressed mass of RSPs by MALDI-TOF MS measurements.

In this paper, we have described the detailed procedures concerning the verification and correction of information of RSPs (i.e., protein names, intron sequences, amino acid sequences, and post-translational modifications) using two genome-sequenced strains of A. fumigatus as a model.

EXPERIMENTAL

Cell culture and preparation of ribosomal protein samples

The genome-sequenced strains of A. fumigatus Af293 (=IFM 54229) and A1163 (=IFM 53842), the neotype strain IFM 57323NT, and a clinical isolate of IFM 62104 were provided by Chiba University’s Medical Mycology Research Center. The genome-sequenced strains and IFM 57323NT were grown in potato dextrose broth (PDB) medium at 25°C for three days. The IFM 62104 strain was grown in PDB medium at 37°C for four days.

After incubation, the growing medium was centrifuged at 5,800 g for 10 min. Fungus bodies were harvested by centrifugation, and ground (twice, for 20 s each time, at 7,000 rpm) between zirconia silica beads (ca. 1,300 mg, 0.1 mm in diameter) in a MagNA Lyser (Roche). After removing the beads and cell debris by centrifugation, the fungus lysates were subjected to ultra-centrifugation at 73,400 g for 1 h to isolate the ribosome fraction as precipitates. The resulting ribosome fraction was solubilized in 20–50 μL 50% acetonitrile containing 1% trifluoroacetic acid (TFA), and then subjected to MALDI-TOF MS measurements.

MALDI-TOF MS measurements

Sample preparation, apparatus, and MALDI-TOF MS data acquisition methods were similar to those described in our previous papers.714) The ribosomal protein sample solution (approx. 1 μL) was spotted onto the MALDI target. Approx. 1 μL sinapinic acid matrix solution at a concentration of 20 mg/mL in 50% acetonitrile with 1% trifluoroacetic acid was then overlaid and dried in air. The MALDI-TOF MS measurements were performed using an AXIMA CFR-plus time-of-flight mass spectrometer (Shimadzu/Kratos, Kyoto, Japan) in positive linear mode. More than three mass spectra for each sample were collected from more than three sample spots. External mass calibration was carried out using three peaks of ACTH (human, 1–24) ([M+H]+, m/z 2932.6) and myoglobin ([M+H]+, m/z 16952.6 and [M+2H]2+, m/z 8476.8) as references.

Calculation of the theoretical mass of RSPs

The amino acid sequence of each RSP was obtained from the UniProtKB (http://www.uniprot.org/). The sequence mass of each RSP was predicted using a Compute pI/Mw tool on the ExPASy proteomics server (http://www.expasy.org/tools/pi_tool.html), with N-terminal methionine loss considered first as a possible post-translational modification. The possibilities of other modifications will be discussed below in Results and Discussion section. The theoretical mass of each expressed RSP was calculated as [M+H]+ ion.

RESULTS AND DISCUSSION

Unification of the RSP name system

The nomenclature of RSPs is in a state of confusion. Names are typically composed of an alphabetical letter (L for large subunit proteins and S for small subunit proteins) and a digit, in which the numbering rule is different for each species. The first nomenclature of RSPs was proposed for bacterial (Escherichia coli) RSPs in 1971.16) For eukaryotic RSPs, mammalian (rat) RSPs were the first to be characterized and named,17) and the proposal for the yeast (Saccharomyces cerevisiae) RSP naming system18) was followed. To solve the nomenclatural confusion, a unified naming system for RSPs has been discussed, in which homologous RSPs are assigned with the same name, independent of organism species. The first proposal was based on a protein family,19) and it was further modified to a new system for naming RSPs proposed in 2014.15) Unfortunately, the new unified naming system15) is not employed in the public protein databases at this time. This paper therefore provisionally adopts the yeast name system18) for convenience of homology search, since Aspergillus and Saccharomyces are related organisms.

To unify the name of each A. fumigatus RSP into the yeast name, a homology search of A. fumigatus RSPs was performed using the NCBI blastp program (http://blast.ncbi.nlm.nih.gov/) to seek the RSPs of S. cerevisiae. Table 1 summarizes the data on A. fumigatus RSPs, such as the accession number and registered name in UniProtKB, the name using the yeast name system, and the name employing the unified naming system as a reference for the future. Most of the RSPs of A. fumigatus registered in UniProtKB were named using the yeast name system. The remaining RSPs, named using another naming system, were renamed to the yeast name in to the following manner. For example, L37a of A. fumigatus Af293 registered in UniProtKB as Q4WZH8, showed high homology with S. cerevisiae L43A (where A means one of the duplicate genes). Because L37a is based on the mammalian ribosome name, it is renamed to L43 in line with the yeast name (incidentally, it corresponds to eL43 in the unified name15)). This L43 protein showed more than 95% similarity to L43 of A. clavatus NRRL1, A. terreus NIH2624, and A. niger CBS513.88. These homologs of another Aspergillus species are registered using the yeast name. To prevent such confusion, all RSPs of A. fumigatus Af293 and A1163 were unified to the yeast name.

Table 1. Correct names of ribosomal proteins of A. fumigatus strains and their accession No. in UniProt.
Protein nameA. fumigatus Af293A. fumigatus A1163
Yeast nameUnified nameDesignation in UniProtAccession No. in UniProtDesignation in UniProtAccession No. in UniProt
Large subunit proteins
L1uL1Ribosomal proteinE9QU85Ribosomal proteinB0XQU0
L2uL2L8, putativeQ4WTW7L8, putativeB0Y3E2
L3uL3L3Q8NKF4L3B0XSL2
L4uL4L4, putativeQ4WEH4L4B0Y2P9
L5eL18L5, putativeQ4WSG1L5B0XR75
L6eL6L6Q4WSZ2L6B0XQN2
L7uL30L7Q4W9S6L7B0YEG9
L8eL8L7AQ4WLM5L7AB0XM24
L9uL6L9, putativeQ4WTJ3L9, putativeB0XQ32
L10uL16L10Q4X1P8L10B0XRW7
L11uL5L11Q4WP20L11B0Y5W6
L12uL11L12Q4WK81L12B0XMZ1
L13eL13L13Q4W9L9L13B0YEB1
L14eL14L14Q4WD82L14B0YD67
L15eL15L15Q4WJV5L15B0XNP4
L16uL13L16aQ4WJH1L16aB0XPG3
L17uL22L17Q6MY48L17B0XMS0
L18eL18L18Q4X279L18B0XW36
L19eL19L19Q4X220L19B0XW91
L20eL20L20Q4WJW9L20B0XNN1
L21eL21L21, putativeQ4WWT1L21, putativeB0XYU3
L22eL22L22, putativeQ4WYA0L22, putativeB0XWY6
L23uL14Alkaline serine proteaseQ4WI20Alkaline serine proteaseB0XUE5
L24eL24L24aQ4WCU3L24aB0YDK4
L25uL23L23Q4WTP5L23B0Y372
L26uL24L26Q4WM42L26B0Y8G7
L27eL27L27Q4WJD7L27eB0XPJ7
L28uL15L27a, putativeQ4WWF0L27a, putativeB0XZ73
L29eL29L29, putativeQ4WKA9L29, putativeB0XMW3
L30eL30L30, putativeQ4X1P9L30, putativeB0XRW6
L31eL31L31eQ4WLK1L31eB0XLZ0
L32eL32L32Q4WZN0L32B0XV02
L33eL33L35AeQ4WX73L35AeB0XYE5
L34eL34L34, putativeQ4WI54L34 protein, putativeB0XUB0
L35uL29L35Q4WT53L35B0XQH1
L36eL36L36Q4WNZ0L36B0Y5T6
L37eL37L37Q4WWR1L37B0XYW1
L38eL38L38, putativeQ4WP31Rpl38, putativeB0Y5X8
L39eL39
L40eL40Ubiquitin UbiA, putativeA4D9S6Ubiquitin UbiA, putativeB0XNB9
L42eL42L44Q4X205Uncharacterized proteinB0XWA6
L43eL43L37aQ4WZH8L37aB0XVB7
P0uL10P0Q4WJR3
P1P1/P2P1Q9HGV0P1B0XPQ5
P2P1/P2P2Q9UUZ6P2/allergen Asp F 8B0XS47
Small subunit proteins
S0uS2S0Q4WYK1S0B0XWG9
S1eS1S1Q4WTM9S1B0Y356
S2uS5S5Q4WAI8S5B0YBW2
S3uS3S3, putativeQ4WJK8S3, putativeB0XP55
S4eS4S4Q4WWR9S4B0XYV4
S5uS7S5, putativeQ4WRU9S5, putativeB0XN49
S6eS6S6Q4WPX5S6B0Y6R5
S7eS7S7eQ4WXU5S7eB0XXS8
S8eS8S8Q4WJZ0S8B0XNE5
S9uS4S9Q4WWT2S9B0XYU2
S10eS10S10bQ4WLQ8S10bB0Y8V2
S11uS17S11Q4WHU8S11B0XUT5
S12eS12S12Q4WJM1S12B0XP41
S13uS15S13Q4WGJ9S13B0YCP0
S14uS11S11Q4X1C6S11B0XS79
S15uS19S15, putativeQ4X1G1S15, putativeB0XS46
S16uS9Rps16, putativeQ4X1C0S9B0XS84
S17eS17S17, putativeQ4X1E0S17, putativeB0XS66
S18uS13S13p/S18eQ4WLH1S13p/S18eB0XM75
S19eS19S19Q4WJN7S19B0XP26
S20uS10S10aQ4WIE3S10aB0XTV5
S21eS21S21Q4WI01S21B0XUN2
S22uS8S22Q4WRN1S22B0XNI4
S23uS12S23Q873W8S23 (S12)B0XQ66
S24eS24S24Q4WAQ6S24B0YC29
S25eS25S25, putativeQ4WRF2
S26eS26S26Q4WJ94S26B0XPP9
S27eS27S27Q4WWP9S27B0XYX4
S28eS28S28eQ4WGB8S28eB0YCF7
S29uS14S29, putativeQ4WLQ2S29, putativeB0Y8V8
S30eS30S30/ubiquitin fusionQ4WCU4S30/ubiquitin fusionB0YDK3
S31eS31Ubiquitin (UbiC), putativeQ4WXZ8Ubiquitin (UbiC), putativeB0XXM3

Ribosomal proteins L40, S30, and S31 are synthesized as fusion proteins with ubiquitin20,21) (note that S31 is assigned as S27a in ref. 20). There are several different types of ubiquitin, all of which are highly conserved and well characterized, so identification of the ubiquitin part in a fusion protein sequence is an easy task. In UniProtKB, L40 is registered as “Ubiquitin UbiA” (accession numbers: A4D9S6 for Af293 and B0XNB9 for A1163). In this fusion protein, ubiquitin forms a part of the N-terminal-side 76 amino acids, whereas L40 is the remaining part of C-terminal-side 52 amino acids.20) In the case of S31 registered as “Ubiquitin (UbiC)” (Q4WXZ8 and B0XXM3), since the N-terminal side 76 amino acid is ubiquitin, the remaining C-terminal side chain is S31. To increase the confusion, S30, which is registered as “S30/ubiquitin fusion” (Q4WCU4 and B0YDK3), is not a fusion protein, and the full length of the registered amino acid sequence corresponds to S30.

The page for alkaline serine protease in UniProtKB (Q4WI20) includes the “ribosomal protein L14P family” in the Family & Domains field. L14P is the bacterial RSP name, which corresponds to yeast L23. The amino acid sequence of this protein showed a high homology with L23 of S. cerevisiae, so the name of this protein was changed to L23. All the names of A. fumigatus RSPs were verified and changed to the yeast name using this procedure.

Observation of MALDI-TOF mass spectra and peak assignment

The next step is the calculation of the theoretical mass of each RSP based on the corresponding amino acid sequences obtained from UniProtKB. The theoretical mass was then compared with the observed mass. Figure 1 shows the mass spectra of the ribosomal protein fraction prepared from A. fumigatus Af293 and A1163, with the peaks under m/z 20,000 assigned. Finally, we were able to assign 31 RSPs, but at this stage only eight peaks could be assigned for each strain when using the registered amino acid sequences in UniProtKB and only if taking N-terminal methionine loss into account. These peaks are indicated as the boxed protein names in Fig. 1. In our previous studies of bacterial RSPs,714) most could be assigned by referring to the theoretical mass calculated from the registered amino acid sequences while only considering N-terminal methionine loss. The main reasons why only eight RSPs could be assigned might be speculated as (1) many incorrect amino acid sequences are registered in the protein databases and (2) post-translational modifications occur, other than N-terminal methionine losses. The following section discusses the actual state of the registered information and how to correct erroneous sequences and speculate post-translational modifications.

Fig. 1. MALDI mass spectra of ribosomal protein fractions obtained from (a) A. fumigatus Af293 and (b) A. fumigatus A1163. The peak assignments with box indicates the RSPs assigned when using the registered amino acid sequences in UniProtKB and without considering any post-translational modifications except for N-terminal methionine loss. The peak labels with +Ac, +Me, and +Hyd indicate acetylation, methylation, and hydroxylation as the post-translational modifications.

Correction of registered amino acid sequences

Incorrectly registered amino acid sequences in bacterial RSPs were mainly caused by mis-annotation of start codons.9,12) In this study, we found that incorrect sequences of eukaryotic RSPs of A. fumigatus were caused by mis-annotation of the exon/intron structure. Accurate coding DNA sequence (CDS) was determined by a combination of informatics procedures involving a homology search and a manual inspection of the DNA sequence of the corresponding genes, followed by confirmation of the correct mass of the expressed RSPs by MALDI-TOF MS measurements. The details of the correction procedures are described below.

The amino acid sequences of RSPs tend to be highly preserved, and show high homology with other species’ proteins. However, RSPs not assigned at the beginning tended to have different sequence lengths registered in the database. For example, Fig. 2 shows the multiple alignment of S29 of A. fumigatus, for which the peak could not be observed at the calculated mass, and other Aspergillus species such as A. clavatus NRRL1, A. nidulans FGSC A4, and A. niger CBS513.88. The amino acid sequences between 1 and 54 are highly conserved between these strains, while the homology and length of C-terminal side are markedly different. Eukaryotic S29 is highly conserved from yeast to humans,22) and has 56 amino acids containing a specific zinc finger-like motif (C-x-x-C).23) Since S29 of A. niger CBS513.88 and A. nidulans FGSC A4 have the zinc finger-like motif and 56 amino acid sequences, these sequences are more likely to be right. The DNA sequence of the S29 gene (rps29) of A. fumigatus Af293 was therefore compared to that of A. niger CBS513.88.

Fig. 2. Comparison of amino acid sequences of S29. Bold sequences (C-x-x-C) identify the zinc finger-like motif.

The rps29 gene of A. niger CBS513.88 is located on c482296-481588 (708 bp) of supercontig An06 (NT_166522.1 in NCBI) and consists of 5 exons and 4 introns. The rps29 gene of A. fumigatus Af293 is located on c3211760-3211177 (583 bp) of chromosome 6 (NC_007199.1 in NCBI) and consists of 5 exons and 4 introns. Figure 3 shows the sequence alignment of these genes, with exon regions underlined. In spite of the high sequence similarity of exon-1 to exon-3, the length of exon-4 is different: it is 57 bp for A. niger CBS513.88 and 61 bp for A. fumigatus Af293. Thus, the differences of 4 bp indicated by the box in Fig. 3 seems to be a redundancy. If these 4 bp are assigned as an intron, as they are in A. niger S29, a frame shift occurs at exon-5, resulting in a shift in the stop-codon (i.e., removal of the redundant italic sequence at the 3′-side in Fig. 3). The numbers of base pairs now match, with the correct amino acid sequence being 56 aa, which is common to a wide range of eukaryotes. The correct amino acid sequence of S29 showed more than 90% similarity to that of A. clavatus and A. nidulans. The correct mass of S29 ion ([M+H]+) was calculated as 6646.7 Da, and the corresponding peak was clearly observed in the mass spectra, as shown in Fig. 1. The same procedure was performed for S29 of A. fumigatus A1163, revealing the same sequence and mass as those of the Af293 strain.

Fig. 3. Procedures for sequence correction of S29 of A. fumigatus Af293.

The sequence information of L39 of both Af293 and A1163 strains was not registered in the protein databases. We tried to find the open reading frame (ORF) of the L39 gene (rpl39) in the genome sequence of Af293 and A1163 strains using the rpl39 gene sequence of other Aspergillus species by manual inspection. As a result of a blast search performed using known rpl39 gene sequences, highly homologous sequences of rpl39 gene were found in chromosome 5 of A. fumigatus Af293 (NC_007198.1, c1443605-1444119) and ctg_000043 of A1163 (ABDB01000043.1, c422041-421524). An alignment analysis of the putative rpl39 gene sequences with those of several Aspergillus species gave the exon/intron structure and a total of 156 bp of CDS. The resulting amino acid sequences were the same between the Af293 and A1163 strains, and also the same as L39 of A. oryzae RIB40 and A. flavus AF70. The theoretical mass of L39 ion ([M+H]+) was determined as 6151.2 Da, and the corresponding peak was observed as shown in Fig. 1. These results strongly support the speculated sequence and expressed mass of L39 of the A. fumigatus strains.

In this manner, the verification of A. fumigatus RSPs under 20,000 Da could be performed by a combination of manual sequence inspections and MALDI-TOF MS measurements. Surprisingly, more than half (17 of 31) of the RSPs were incorrectly registered in the public protein databases, mainly due to erroneous annotations of exon/intron structures. In addition, two RSPs were registered as fusion proteins, and L39 was absent. The corrected CDS and amino acid sequences of these 17 RSPs are summarized in the supporting information Table SI-1 for A. fumigatus Af293 and Table SI-2 for A. fumigatus A1163.

The automatic annotation of exon/intron structures after whole-genome sequencing is likely to be imperfect, since the only clue to determining introns applied is the GT-AG rule (most introns start with GT and end with AG). Because accurate determination of cDNA by mRNA sequencing is both expensive and time-consuming, a full set of experimental cDNA sequence data of Aspergillus RSPs has not yet been reported. Our approach appears to be a simple and effective method of speculating accurate amino acid sequences of RSPs.

Post-translational modifications

Unidentified RSPs still remained after sequence correction, suggesting the presence of post-translational modification. In this study, post-translational modifications could be speculated for 11 RSPs, as described in this section. These modifications appear to be conserved in eukaryotes.

Acetylation, especially at the N-terminus, seems to be a common post-translational modification in eukaryotic RSPs. Nine RSPs (L31, L35, S11, S15, S16, S18, S21, S24, and S28) showed clear peaks at +42 Da over the calculated sequence mass, suggesting acetylation. For example, although the amino acid sequence of S21 is slightly different between Af293 and A1163 strains, clear peaks are seen in the +42 Da position for both samples, as shown in Fig. 4.

Fig. 4. Peak shift of +42 Da from sequence mass of S21. (a) A. fumigatus Af293, and (b) A. fumigatus A1163. Although the amino acid sequences and sequence masses of S21 are different between Af293 and A1163 strains, the observations of a common peak shift of +42 Da suggest common acetylation.

In yeast RSPs, when the penultimate amino acid residue is serine, N-terminal methionine loss followed by N-terminal acetylation is likely to occur.24,25) Among probably acetylated nine RSPs, L31, L35, and S18 have an MS- sequence at the N-terminal side. In yeast RSPs, S21 with ME- and S28 with MD- are acetylated.25) This information strongly suggests the acetylation of S21 and S28 of A. fumigatus strains with the same N-terminal sequences. Yeast S11, S15, S16, and S24 with MS- sequences are N-acetylated.25) However, rat S11 (in UniProtKB, P62282) and S1526) with MA- would also be N-acetylated. Therefore, S11 and S15 (and also probably S16) with MA- are likely to be N-acetylated.

Methylation is another possible post-translational modification of RSPs. In methylation of L42 at Lys-55 is evolutionally conserved among eukaryotes.27) Because sequence homology around Lys-55 is high (yeast Lys-55 corresponds to Lys-50 of A. fumigatus by similarity), methylation is likely to be a post-translational modification of L42 of A. fumigatus. A clear peak could in fact be observed around m/z 12028.3, taking account of +14 Da added to the calculated sequence mass.

Prolyl dihydroxylation of eukaryotic S23 is known as an evolutionarily conserved modification,28) and Pro-64 is hydroxylated in yeast S23. High sequence homology around Pro-64 of S23 suggests S23 of A. fumigatus strains to also be hydroxylated, resulting in a +32 Da shift. The corresponding peaks could be clearly observed around m/z 15802.5.

List of ribosomal protein biomarkers and its applicability

In this way, we could finally confirm the mass of 31 of 50 expressed RSPs under 20,000 Da. Most of the intense peaks observed under m/z 20,000 could be identified, as shown in Fig. 1. Unidentified RSPs are probably caused by low ionization efficiency due to the acidic properties and unclear post-translational modifications (we found more putative methylated and acetylated RSPs, but they are omitted in this paper due to a lack of supporting references). Tables 2 and 3 summarize the assigned ribosomal proteins of A. fumigatus Af293 and A1163 strains, together with calculated masses and possible post-translational modifications. Almost all identified RSPs have the same sequence and mass except for S21 with only one amino acid difference.

Table 2. Assigned ribosomal subunit proteins of A. fumigatus Af293.
Protein nameAccession No. in UniProtpICalculated mass as
[M+H]+
Observed massSequence correctionModificationsa
Yeast nameNew system
L40eL40A4D9S69.56002.36001.7yes
L39eL3912.66151.26151.4yes−Met
S29uS14Q4WLQ210.16646.76646.5yes−Met
S30eS30Q4WCU411.56789.16789.5−Met
L29eL29Q4WKA911.67456.67457.0−Met
S28eS28Q4WGB810.97710.07710.0yes+Ac
S31eS31Q4WXZ89.89134.99133.9yes
L38eL38Q4WP3110.39153.89154.5−Met
L43eL43Q4WZH810.510025.810024.5−Met
S21eS21Q4WI018.510052.210053.2+Ac
L37eL37Q4WWR111.010386.910386.4yes−Met
L30eL30Q4X1P99.911171.111170.9−Met
L36eL36Q4WNZ011.911869.811869.4yes−Met
L42eL42Q4X20510.512028.312028.1yes−Met, +Me
L33eL33Q4WX7310.212215.112215.2yes−Met
L34eL34Q4WI549.613164.513164.4yes−Met
S26eS26Q4WJ9410.913338.713337.7−Met
L31eL31Q4WLK110.513919.113918.5−Met, +Ac
L35uL29Q4WT5311.114532.014533.0yes−Met, +Ac
L32eL32Q4WZN011.314836.614835.9−Met
L26uL24Q4WM4210.914979.414979.1yes−Met
S24eS24Q4WAQ610.715226.615226.0−Met, +Ac
L27eL27Q4WJD710.515682.615683.5yes
S23uS12Q873W810.515802.515801.9−Met, +Hyd (2)
S16uS9Q4X1C010.215883.415881.9−Met, +Ac
S17eS17Q4X1E010.016089.516087.9yes−Met
S19eS19Q4WJN79.616351.416350.6yes−Met
L28uL15Q4WWF010.416631.116630.9−Met
S15uS19Q4X1G110.117626.517626.5−Met, +Ac
S18uS13Q4WLH110.517779.517780.5yes−Met, +Ac
S11uS17Q4WHU810.818478.618480.7yes−Met, +Ac

a −Met: N-Methionine loss, +Ac: acetylation, +Me: methylation, +Hyd: hydroxylation.

Table 3. Assigned ribosomal subunit proteins of A. fumigatus A1163.
Protein nameAccession No. in UniProtpICalculated mass as
[M+H]+
Observed massSequence correctionModificationsa
Yeast nameNew system
L40eL40B0XNB99.56002.36001.9yes
L39eL3912.66151.26151.5yes−Met
S29uS14B0Y8V810.16646.76646.0yes−Met
S30eS30B0YDK311.56789.16790.8b−Met
L29eL29B0XMW311.67456.67457.5−Met
S28eS28B0YCF710.97710.07709.7yes+Ac
S31eS31B0XXM39.89134.99135.0yes
L38eL38B0Y5X810.39153.89154.8−Met
L43eL43B0XVB710.510025.810025.2b−Met
S21eS21B0XUN28.510038.110037.7+Ac
L37eL37B0XYW111.010386.910385.9yes−Met
L30eL30B0XRW69.911171.111170.7−Met
L36eL36B0Y5T611.911869.811869.3yes−Met
L42eL42B0XWA610.512028.312027.9byes−Met, +Me
L33eL33B0XYE510.212215.112214.8yes−Met
L34eL34B0XUB09.613164.513166.1yes−Met
S26eS26B0XPP910.913338.713336.9−Met
L31eL31B0XLZ010.513919.113918.5−Met, +Ac
L35uL29B0XQH111.114532.014531.4yes−Met, +Ac
L32eL32B0XV0211.314836.614836.3−Met
L26uL24B0Y8G710.914979.414979.3yes−Met
S24eS24B0YC2910.715226.615225.9−Met, +Ac
L27eL27B0XPJ710.515682.615682.3yes
S23uS12B0XQ6610.515802.515802.2−Met, +Hyd (2)
S16uS9B0XS8410.215883.415884.1b−Met, +Ac
S17eS17B0XS6610.016089.516089.2yes−Met
S19eS19B0XP269.616351.416350.7yes−Met
L28uL15B0XZ7310.416631.116630.4−Met
S15uS19B0XS4610.117626.517626.0−Met, +Ac
S18uS13B0XM7510.517779.517779.2yes−Met, +Ac
S11uS17B0XUT510.818478.618479.8yes−Met, +Ac

a −Met: N-Methionine loss, +Ac: acetylation, +Me: methylation, +Hyd: hydroxylation. b Shoulder peak.

To confirm the applicability of the reference mass list, RSPs of the neotype strain IFM 57323NT and a clinical isolate IFM 62104 were further characterized. Because the criteria of species identification is the similarity to the type strain, the characterization of RSPs of IFM 57323NT would be important to establish the reliable biomarker list for the identification of A. fumigatus. The characterization of the clinical isolate IFM 62104, which have been already identified as A. fumigatus, was performed as a demonstration for the analysis of real samples.

Figure 5 shows the partial mass spectra of ribosomal protein fractions obtained from (a) the Af293, (b) A1163, (c) IFM 57323NT, and (d) IFM 62104 (whole mass spectra of IFM 57323NT and IFM 62104 are shown in Figs. SI-1 and SI-2 in the supporting information). In this mass range, seven identified RSPs (S31, L38, L43, S21, L37, L30, and L36) are commonly observed. Here, of two types of S21, the peak for IFM 57323NT and IFM 62104 appeared the same as S21 of A1163. In the entire mass spectra, all 31 RSP biomarkers could be observed for the IFM 57323NT and IFM 62104 strains. These results suggest that the reference mass list can be used as a clue for the species identification of A. fumigatus.

Fig. 5. Partial MALDI mass spectra of ribosomal protein fractions obtained from (a) clinical isolate IFM 62104, compared with those from A. fumigatus (b) Af293 and (c) A1163 strains. The right mass spectra are expanded between m/z 9,950–10,150.

CONCLUSION

In this study, we have investigated the actual state of RSPs in the public protein databases by characterizing the RSPs of genome-sequenced strains of A. fumigatus Af293 and A1163. As a result, we could solve the problems of the registered information of RSPs in the public protein databases.

As for the problem concerning the confusion of the nomenclature, all the RSPs’ names were verified and unified to the names based on yeast which is most prevalent in the public protein databases (also listed under the new unified naming system15)). As for the second problem originated from incorrect sequence information, we have pointed out that more than half of the A. fumigatus RSPs are incorrect mainly due to mis-annotation of exon/intron structures. Because RSPs are highly conserved, we could easily find out the candidates of the correct sequences, and verify them by comparing the theoretical mass with the observed mass. In addition, the post translational modifications such as acetylation and methylation could also be confirmed.

By solving these problems, we have successfully completed the reference mass list of two genome-sequenced strains of A. fumigatus. By using the completed sequence information of the RSPs of A. fumigatus as a reference, information on the RSPs of other related fungal strains can be more easily verified by combining in silico inspection with MALDI-TOF MS measurements. We are proceeding with the characterization of RSPs of other Aspergillus genome-sequenced strains to make reliable lists of biomarker RSPs for identification of Aspergillus species. Once the Aspergillus RSP biomarker lists have been compiled, ribosomal protein-based MALDI-TOF MS is anticipated to be a powerful and reliable tool in the field of clinical microbiology.

Acknowledgments

This work was supported in part by a research grant from the Institute for Fermentation, Osaka (IFO), JSPS Kakenhi Grant Number 25430198, and the National Bioresource Project (Pathogenic Microbes) in Japan (http://www.nbrp.jp/).

REFERENCES
 
© 2016 Sayaka Nakamura, Hiroaki Sato, Reiko Tanaka, and Takashi Yaguchi. This is an open access article distributed under the terms of Creative Commons Attribution License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
feedback
Top