Peptide Peak Detection for Low Resolution MALDI-TOF Mass Spectrometry

Jingwen Yao; Shin-ichi Utsunomiya; Shigeki Kajihara; Tsuyoshi Tabata; Ken Aoshima; Yoshiya Oda; Koichi Tanaka

doi:10.5702/massspectrometry.A0030

Abstract

A new peak detection method has been developed for rapid selection of peptide and its fragment ion peaks for protein identification using tandem mass spectrometry. The algorithm applies classification of peak intensities present in the defined mass range to determine the noise level. A threshold is then given to select ion peaks according to the determined noise level in each mass range. This algorithm was initially designed for the peak detection of low resolution peptide mass spectra, such as matrix-assisted laser desorption/ionization Time-of-Flight (MALDI-TOF) mass spectra. But it can also be applied to other type of mass spectra. This method has demonstrated obtaining a good rate of number of real ions to noises for even poorly fragmented peptide spectra. The effect of using peak lists generated from this method produces improved protein scores in database search results. The reliability of the protein identifications is increased by finding more peptide identifications. This software tool is freely available at the Mass++ home page (http://www.first-ms3d.jp/english/achievement/software/).

INTRODUCTION

The first step of protein identification using mass spectrometry after a data acquisition is to extract ion peaks from mass spectra. In protein analysis methods,^1–3) database search tools are used to represent the match between the ions from the experiment and the theoretical calculation from the sequence provided in a protein sequence database. In de novo sequencing software, an accumulation of ion features in the spectra is applied to build ion series candidates.^4,5) Various scoring schemes^6–10) are usually used to rank or select the candidates in these database search tools and de novo sequencing software. This technology has also been used in the detection of protein biomarkers^11,12) in cancer diagnostics. Quality of peak lists is a prerequisite that ensures reliable scores from these protein identification methods.

Consistent progress of peak detection in mass spectrometry has led to various types of ion peak identifying software. But with new techniques applied in mass spectrometer, new peak detection methods are required to cope with signal and noise features contained in these spectra. When a mass instrument is used to measure a sample at the protein/peptide level, a spectrum with signal peaks that represent intact protein or peptide ions is presented. Furthermore, in the identification of proteins using tandem mass spectrometry, there is a greater need for selection of accurate peaks of fragment ions from the peptide.

Combining the liquid chromatography (LC) technique with the MALDI (Matrix-assisted laser desorption/ionization) MS system enhances the high-throughput platform in proteome analysis for MALDI mass spectrometry. LC-MALDI-TOF (Time-of-Flight for ion detection system) is a commonly used instrument type which measures ions from the digested protein samples. The peak lists generated for LC-MALDI spectral data should show the advantages of using high sequence coverage for the detected proteins.

A high quality peak detection tool also maximizes the number of ion signal peaks while keeping the noise peaks to a minimum, since exclusion of all noise is very difficult. Various different peak detection methods^13–19) have been proposed and implemented in computer software. Key points such as the use of signal to noise ratio in filtering out noise peaks, combining specific shape functions in order to fit models to peak shapes/isotopic clusters and find their relative ion peaks, or using a model based on estimated parameters²⁰⁾ to distinguish peptide ion from background peaks, or applying a continuous wavelet transform (CWT) to localize a signal, have all been considered and applied in peak detection algorithms. An evaluation²¹⁾ of publicly available peak detection software showed that the software tools based on CWT¹⁸⁾ has the best performance in selected MS spectral data. The peak detection method using CWT is capable of detecting peaks by finding ridges in the wavelet transform space, but in addition to choosing a suitable mother wavelet function, it is also parameter dependent. When a mass spectrum contains several different characteristics that are inconsistent with the spectral range, such as peak shape and width, it becomes more difficult to optimize several parameters to identify all possible signal peaks. With an increase of scale factor, longer computing time may be required to refine the peak parameters. The method using Bayesian²⁰⁾ to estimate parameters for modeling spectral peaks reported that it could outperform the wavelet method in another test for identifying peptide ions in MS spectra, but the parametrical model did not include isotopic pattern of ions. While this is usually a key factor to identify peptide fragment ions, an extra step must be considered to find mono-isotopic fragment ion following the initial detection.

The aim of this research work is to find a suitable method in peak detection and to build an appropriate computer program around it. It was initially meant to solve the problem of low resolution in the MALDI-TOF-TOF mass spectra. Since the peaks in the spectra can be distorted from their symmetrical shapes and become not well-resolvable between isotopic peaks, it is more difficult to apply any peak detection methods based on model fitting. Therefore, a method using intensity classification has been proposed; where the shape of the peak is less significant. The noise level for each selected range at mass (m) over charge (z), m/z is expected to be found and the ion peaks can be selected in a more robust manner through determining accurate noise level in the data. We implement a new algorithm based on this classification in a computer program MWD (Multi Window Detection), that the actual peak detection is more reliant on the intensity distribution for each selected m/z range.

METHOD

Intensity classification

As observed from spectra, the noise level may vary from different mass ranges in a peptide spectrum. To more accurately find the noise level, the mass spectra from peptide dissociation is divided into a number of ranges according to the precursor ion mass. An interval of ΔM in Dalton (ranged 100–300 Da) is selected and used to go through the processing of the spectra, from the low mass end to the high mass end, where the high mass end is close to the precursor ion mass. In this paper, the ΔM value of 120 Da is chosen because it is a value close to the average mass of all amino acid residues. The noise level in each divided range is determined from the data points involved in that range. It is unlike other peak detection programs, where some form of signal-to-noise ratio threshold is simply used to filter peaks but MWD applies the detected noise level over different mass range to determine how peaks are selected.

Figure 1(a) illustrates a screenshot of a spectrum at a selected mass range and intensity classification of the data points involved in the range [Fig. 1(b)]. Determining the noise level of a selected mass range requires the classification of 3 discrete groups of data points, presuming that 3 classes A, B, and C of peak intensities can be found, as shown in Fig. 1(b).

Fig. 1. MS/MS spectra are first divided into a number mass ranges by a given interval. Intensity classification is carried on in each mass range. (a) A spectrum in the binned mass range is selected. (b) Intensities of data points involved in the mass range are classified as A, B, and C. The x-axis is the number of peak in the range, and y-axis represents the intensity of each peak. The peaks represented by A are indicated by solid arrow and B by dashed arrow for ion peaks in (a). C contains all background peaks.

A, where the points have significant intensity compared with other data points in the range
B, which contains intermediate intensity values and
C, where the intensities are much lower than those in class A

Strong signal peaks are classified in A, but these are usually few in number. C contains the majority of data points but these points are mainly made up from noise. The points in B might contain some signal peaks, which have lower intensities than those in A, but these peaks are at the noise level boundary and are the most difficult to determine for peak selection. The partial peaks from A and B are marked in the spectrum of Fig. 1(a). Together with Fig. 1(b), this is an example to depict how peak intensities are classified in the selected mass range.

In most cases, there is no simple method to distinguish the data points in B from other two classes. Two steps, Determination of Noise Level and Refinement of Noise Level, were designed in the MWD process. The ion peaks in A are explicitly extracted from the first step in a given mass range. The further step might identify more ion peaks in B. The more details are described in following sections.

Determination of noise level

If cluster C is classified as the main class in this method, the points in A can be considered as the “outliers.” Therefore, an outlier detecting method such as Z-score²²⁾ can be applied in order to determine whether any points in A can be distinguished. Then, the points found in A are temporarily removed and the noise level is determined from the points in C and perhaps some from B. This is because the points in B may contain some signal peaks with low intensity and mix with some noise peaks.

To implement the idea to a computer program, all the data points in a selected mass range are firstly arrayed in an ascending order. Z-score is then calculated for each point Z_i, corresponding to the point with intensity value I_i in the range. The calculation is based on the formula:

(1)

where I_m is the mean value and I_sd is standard deviation calculated from all peak intensities. In practice, a modified formula using the median absolute deviation,²³⁾ M_mad, instead of standard deviation, is used:

(2)

I_med represents the median of an original intensity array. Z_i is then modified as:

(3)

G is a constant value which is the scale factor when the formula is converted from the standard deviation. From this calculation, the Z values obtained from all noise points are expected to distribute symmetrically from zero.

Z_i value for each point reflects how far a measured value I_i is from the mean/median value. A larger Z corresponds to a more significant intensity from the others in the class. The criterion for the Z value is determined by learning from a determined data group, where the ion peaks in the spectra have been well identified. A value of 3.0 is set as an initial criterion for the Z value to decide if the point is classified as class A. The intensity mean value I_mean of all data points left in the region after removing the points in A is simply calculated. Another variable R_i is defined as:

(4)

where I_i>0 and I_mean>0; to calculate the intensity ratio of each point to their mean value. For the point position of R_i with the closest value to 1.0, it should be at the same position as the median position of the intensity data array in ascending order. If the position is moved down from the median position, this means that some signal peaks with low intensity value in class B may have been involved in the mean calculation.

All these properties related to the defined parameters are based on the assumption that noise points are symmetrically distributed around the mean value; i.e., Gaussian distribution. This symmetric distribution is easier to find when spectral raw data are used from noise statistics. However peak intensity instead raw data point is used in this method. The normality of distribution from the peak intensities may no longer retain because majority of data points in the low intensity region has been cut off. The distributions of intensity values from selected seven mass ranges of a spectrum are displayed in histogram plot, Fig. 2(a). It can be seen that the data points still distribute around the mean although they are not perfectly symmetrical. When some ion peaks are comparable to the noise level, they are tailed on the right side, as those demonstrated in ranges 1, 6, and 7. The method used in this procedure aims to identify these ion peaks. Among the data points in each selected mass range, it is more certain that the peak with the lowest intensity is a noise peak. An initial noise level is therefore found from this peak. The detail is given as:

Firstly, a d_i value is calculated as:

(5)

in order to find the deviation of each point to the position. From the definition of R_i in Formula (4), d_i has a negative value when the intensity is below the mean value. Figure 2(b) shows the data points distributed in [d_min, −d_min] from a spectrum, where d_min is the minimum of d value, which is from the lowest intensity and it is usually located at the first position in the ordered data array; −d_min is the reverse of d_min in its positive part, and d=0 corresponds with the mean value of the all noise intensities in the range. All these data points are distributed at both sides of d=0. −d_min is considered to relate to the maximum intensity among the noise peaks, i.e., in class C, and then used to determine an initial noise level. This level can be further adjusted and optimized according to the detection of features from the intensity distribution as those shown in Fig. 2(a). Some more details will be discussed in next section. A signal to noise ratio threshold, S/N, can be decided from the detected noise level, the ion peaks can be selected based on the threshold in the range.

Fig. 2. (a) Histogram plots represent all noise peaks distributed in selected mass ranges. (b) Seven mass ranges are obtained from an actual peptide MS/MS spectrum. Each curve in the plot represents distribution of data points from class C, which is assumed including all noise data points, at d value from different mass ranges in the spectrum. The x axis corresponds to the number of points found in class C.

Refinement of noise level

The peak detection in this new computational approach relies on the noise level found in each defined mass range. Since the noise level varies within each mass window and the detection relies on the distribution of peak intensities in the window, this allows the peak detection to select the ion peaks with low intensities. With a determined noise level, a number of peaks are identified as ion peaks and these peaks are selected by a threshold derived from the noise level. Figure 3 includes selected ion (♦) and noise (✳) peaks from a spectrum in seven mass ranges. The selected peaks are always located on top when the d value is used in the plot. If there are no peaks found above the threshold, like in the fourth mass range, then no peaks are selected.

Fig. 3. Distribution of data points against the calculated d value in the different selected mass ranges. The line in the plot connects detected noise levels from all seven mass ranges. The data points above the line are detected as ion peaks and selected in final peak list.

In the calculation of Z-score values, peak intensities in the selected mass range are all converted to a scale of how each peak deviates from their mean or median value. Another advantage is quickly locating how far the peaks at the extremities differ. In addition to identifying the significant high peaks in the range by using this value, the peaks with very low intensities can also be found. The minimum value of Z scores in each mass range, Z_min is selected to examine how the lowest peak in the range is deviated from others, i.e., a large magnitude in the negative direction. A number of such points with very low intensity value involved in the calculation may result in a lower noise level than the real one. A scheme that combines comparing the median position with the R_i value as mentioned in the last section, and the Z_min value in the range is used to further adjust and optimize the noise level that is obtained from the preceding calculation.

In adjusting the noise level, a criterion of Z_min can be considered. Contrastingly, if more data points with relatively high intensities are involved in the noise calculation, it may derive a higher noise level. In this case, the derived level is reduced to represent a real noise level. A consequence of the optimization and adjustment is that a few peaks may be added or removed from the final peak list.

These parameters or variables can be involved in the calculation of finding noise level. But they can be determined by the features detected in a spectrum and are not necessary for a manual intervention. Therefore, unlike other peak detection software, only a few parameters are required to select from the provided interface by the user. These parameters mainly involve what type of instrument is used to acquire spectral data, optional methods provided with the program for pre-processing of spectral raw data, and a parameter to increase a few number of peaks in final peak list if the peak list is too short from the default selection in the program.

Program implementation

Spectral raw data are input to the method in order to ensure that the number of points present in the defined mass range is enough for the calculation of the Z-score function. Therefore, several pre-processing steps are required to get the best result from the method. This may include smoothing and peak centroid^24,25); and baseline subtraction if it is necessary. The mono-isotopic ion peaks will be selected from a procedure²⁶⁾ in the program if the isotopic clusters are resolvable in spectra. MWD is developed in C#.Net. It can be run as a standalone application and all the functions are also formed in all components, which are able to be integrated to other analysis platforms. A flow chart is given in Fig. 4 to show the basic workflow in MWD program. The proposed method mainly consists of four steps, Divide mass range into n windows, Determine noise level, Refine noise level, and Select ion peaks. These steps are repeated to ensure all divided mass ranges are processed. An extra step, Precursor ion correction is used to get accurate precursor ion mass from MS spectra. At present, MWD has been implemented in freely available software Mass++,²⁷⁾ which can be downloaded from the web site: http://www.first-ms3d.jp/english/achievement/software.

Fig. 4. Flowchart of the peak detection in MWD program.

RESULT AND DISCUSSION

Comparison of quality of peak lists from peak detection

To evaluate the performance of the proposed method in peak detection, the test starts with examining the true/false positive (TP/FP) rates found in the peak lists. The testing spectra are all from the MS/MS spectra (LC-MALDI-TOF-TOF, AXIMA Performance, Shimadzu/Kratos) of known peptide sequences. The raw spectral data firstly went through a pre-processing procedure and all peaks over a low intensity threshold were recorded in a peak list. All possible theoretical ions from the sequence were also calculated. The peak list was then used to match the theoretical ions in a suitable mass range for which the instrument could be able to measure. The matched ion peaks were recorded as true peaks in the spectra; the rest of the peaks in the list as false peaks.

The samples used for the tests are serum albumin (bovine) BSA, the sample was digested by trypsin. Different experimental conditions, including concentration of sample, acquisition time scale, laser strength and so on, were applied to the samples during collection of the spectra from the instrument. Sequence number was used to represent the separate data sets acquired from the different given conditions in the experiment, like BSA1 to BSA6.

First, all peak lists for testing spectra were examined by their precision. It calculates as:

(6)

It is a ratio of detecting correct ion peaks in the peak list. Where N_TP is number of true ion peaks and N_FP is the number of false ion peaks. A lower precision represents more false positive peaks being involved in the detection. If all detected peaks in the peak list are correct ion peaks, the precision is 1.0. The other commonly used score to represent the rate of correct measure results is sensitivity, which is defined as:

(7)

It indicates a ratio of detecting correct ion peaks N_TP to all expected ion peaks (N_TP+N_FN). In addition to the variables defined for precision (true positive and false positive), N_FN (false negative) in this formula is the number of true ion peaks but are not detected by the method. It is determined by examining the match of theoretical fragment ions of peptide with peaks present in the peak lists, rather using the theoretical fragment ions directly because in real practice, a peptide hardly dissociates and produces full pattern of fragment ions in spectra. The maximum value of sensitivity is also 1.0.

In parallel, the peak lists derived from this program were also generated from Distiller peak detection tool, from Matrix Science (http://www.matrixscience.com), which can usually obtain better peak lists than other programs from numerous internal tests for the MALDI-TOF-TOF spectra. Several parameters in Distiller can be selected to control the length of peak lists. Different settings for the parameters may also affect the quality of peak list and result in different Mascot database search scores. But all peak lists were generated by Distiller using a default set of parameters optimized by Matrix Science because it is very difficult to find the best set of parameters for many spectral data from this instrument. The peak lists generated from Distiller are used as benchmark; they are then compared with all the peak lists derived from MWD.

The plot in Fig. 5 shows the precision (a), sensitivity (b), and descriptive statistics analysis from the peak lists obtained from a set of spectral data for BSA by Distiller and MWD, respectively. All the raw MS and MS/MS spectra were input into the peak detection programs to produce peak lists for each peptide spectrum. MS spectra were used to correct corresponding precursor ions provided in this program. It can be seen that the precision for BSA1–BSA6 at the median value, ranges from 62–85% for MWD peak lists, while 61–91% for Distiller peak lists [Fig. 5(a)], although the number of identified peptides in two data set are different. It is noted that Distiller can usually produce shorter peak lists on the given peak processing parameters. But using MWD peak lists, more peptides can be identified. The more details of identification from a database search will be discussed in a later part.

Fig. 5. Precision (a) and sensitivity (b) values for the number of real ions in the peak lists produced from Distiller and MWD peak detection tools and, measured over 6 samples from BSA(1–6).The number (n) of peptides identified from Distiller lists are: BSA1 (n=14); BSA2 (n=19); BSA3 (n=9); BSA4 (n=7); BSA5 (n=11); BSA6 (n=15); and MWD lists: BSA1 (n=34); BSA2 (n=22); BSA3 (n=22); BSA4 (n=15); BSA5 (n=20); BSA6 (n=29). The median for each data is drawn in the middle of a box; the lower quartile and the upper quartile are the boundaries of a box; symbol+ represents an outlier. The box plot of Distiller and MWD are shown in left side and right side, respectively.

Some more attention was paid in order to investigate the peak lists with lower precision in MWD peak lists. For instance, in BSA4 the minimum value of precision is ∼28% and it corresponds to a spectrum acquired from precursor ion at 2045 Da. The inspection of the spectral details showed that the fragmentation in the spectrum was poor. Most ion peaks, except a few in the low mass range, have low intensities and sit at the noise level boundary. However, the peak list from this spectrum can still retrieve a correct peptide hit, RHPYFYAPELLYYANK, from protein BSA by using the Mascot search engine. The peak list for this peptide from Distiller could not derive the correct result, though the precision value was higher (∼31%).

These plots only show how accurate are the real ion peaks selected from the proposed method. If only strong ion peaks are recorded in a peak list, the precision for the peak detection can easily reach ∼1.0, because all peaks could be correct ions, such as, the peak lists derived from Distiller method in this test. The other statistical measure, sensitivity, which reflects the probability of a positive test, given that the peak is ion in a spectrum, may be more comparable to the results from these different peak detection methods. In Fig. 5 (b), the sensitivity obtained for the peak lists from two peak detection tools are displayed. In order to compare the results, some peptide spectra were removed from MWD peak lists, only the same peptide spectra that can be identified from Distiller peak lists are remained. In average, higher values are obtained from the MWD peak lists. It is further confirmed that more fragment ions can be identified and picked by MWD peak detection.

Figures 6(a) and (b) depict scatter plots of TP against FP rates for the selected two spectral data sets with more number of peptides that can be identified with. They are present here to demonstrate the distribution of correct detection by two detection tools. Plot (a) shows that Distiller peak lists derived a low FP rate, but the TP rate was also low. In plot (b), though Distiller can provide some good TP rates, low TP rates were also included. However most peak lists from MWD present the TP rates in the upper range with an FP rate of <0.2. This also shows that MWD can derive more consistent results in peak detection. The conclusion from the comparison among all data is that in general, although the Distiller peak lists can give a lower false positive rate, the true positive rate is also usually low. This signifies in most cases that Distiller only selected the ion peaks with a significant height. The peak lists from MWD can reach a higher true positive rate with a reasonably low false positive rate. These two examples in Fig. 6 are typical representations of relationship between TP and FP calculation from the peak detection results acquired by using these two tools, which show a general trend in selecting real peaks that MWD can derive more consistent results in peak detection. Other testing methods were applied to further evaluate the performance of peak lists in protein identification.

Fig. 6. Analysis of quality of peak lists generated from peak detection tools. The same peptide spectra were used to generate peak lists from MWD (•) and Distiller (✳). The calculation of TP (true positive) and FP (false positive) rate was based on the same theoretical fragment ion list from each peptide. The TP rate is then plotted against FP rate. (a) and (b) show the calculating results from two different data sets. The line of no-discrimination is also drawn along diagonal line in each plot.

Peak lists for the spectra with different quality of fragmentation

The performance of the method in selecting ion peaks was also investigated with a different quality of fragmentation. Three peptide spectra with fragment quality at different levels were randomly selected for testing. They were categorized as high (named SpecH), where numbers of strong fragment ions found in the spectrum were sufficient; low (named SpecL), where only a few ions were fragmented and contained the majority of noisy peaks in the spectrum; and medium (named SpecM), which was between the two. The details about these peptides and testing results are listed in Table 1.

Table 1. Peptide details for testing performance of the peak detection with different quality of fragmentation in spectra.

Spectrum	Peptide	MWD		Distiller
Spectrum	Peptide	Score ^a)	No. ion ^b)	Score	No. ion
SpecH	YNGVFQECCQAEDK	75	52	115	39
SpecM	SLHTLFGDELCK	50	34	49	14
SpecL	DDPHACYSTVFDK	34	24	14	8

^a) “Score” is the MS/MS Mascot search score. ^b) “No. ion” is number of matched ions from the peak lists to the expected peptide fragment ions.

The peak lists for each spectral raw data were generated by this peak detection program (MWD) and Distiller. Then the peak lists were used to search the Swiss-Prot protein database using the Mascot search engine. The consistent parameter setting in the search for the tolerance values (peptide mass tolerance: 0.6 Da, fragment mass tolerance: 1.5 Da), modification (fixed modification: Carbamidmethyl (C), variable modification: Oxidation (M)), instrument type (MALDI-TOF-TOF) and so on are selected for both sets of data. Thus, we can compare the Mascot scores resulting from the peak lists of the two programs. A large parameter for fragment mass tolerance was chosen here to ensure that the ions in the poor quality spectrum can be used. As demonstrated in Table 1, the spectrum with medium quality (SpecM), both peak lists reach a very similar score [50 (MWD), 49 (Distiller)]; but the MWD list can acquire more matched ions (34) than the Distiller list (19). For the other two data sets, the Distiller list got a higher search score [115 (Distiller), 75 (MWD)] for SpecH but a lower score [14 (Distiller), 34 (MWD)] for SpecL, compared to MWD lists. Since Distiller favors selecting high peaks from the spectra, when the fragment quality is good and a number of strong ion peaks can be found in a spectrum, the peak list easily achieves a higher Mascot search score, such as in sample SpecH, 39 strong fragment ions are identified by Distiller while 52 fragment ions can be found from MWD peak list. But for the spectra with poor quality fragmentation, the capability is reduced. Compared with these results, MWD can perform well both on high and low quality spectra. In general, MWD peak lists contain a high coverage of real ion peaks. This is another important factor in protein identification using MS/MS to find reliable peptide hits in a database search method.

Comparison of database search results

In protein identification, by searching MS/MS spectra to match the peptides in a protein database, a key point is how reliable²⁸⁾ the derived matches are from the given peak lists. This includes the reliable matches of fragment ions from the individual MS/MS spectra to the proposed peptide ions in the database. The number of the reliable matched MS/MS spectra increases the coverage of peptides for the found protein and therefore greater confidence in the identification is achieved. Since the test data is from the standard protein samples: BSA, lysozyme (LZM) and alcohol dehydrogenase (ADH), when the Mascot search engine is used to search the database to find the expected proteins, two Mascot scores²⁹⁾ are simply used to evaluate the performance of using the peak lists from the developed peak detection tool. The peak lists from Distiller were also run with the same search parameters for comparison. The expected proteins are ranked at the top hit with significant score except for Distiller list in BSA8. Table 2 shows a complete test result for all MS/MS spectra acquired from each protein sample. In most cases, MWD lists can derive higher total protein scores than Distiller lists, which advances from that the peak lists generated from MWD can validate more MS/MS spectra; that is, a greater number of MS/MS spectra, from a whole range of LC retention times, are useable to identify peptides compared with Distiller lists. This consequently increases the peptide coverage rate in the protein identification.

Table 2. Mascot score of total protein match using peak lists from MWD and Distiller for all MS/MS spectra.

Experiment	Mascot search score
Experiment	MWD list	Distiller list
BSA1	955	689
BSA2	890	473
BSA3	746	338
BSA4	440	234
BSA5	650	378
BSA6	823	438
BSA7	362	82
BSA8	165	N/A*⁾
BSA9	325	175
ADH1	330	209
ADH2	157	160
LZM1	113	82
LZM2	118	83
LZM3	141	122
LZM4	175	143

*⁾ N/A indicates no hit found for the correct protein.

The Mascot search score directly reflects quality of matches between the experimental MS/MS spectrum and the proposed peptide. A number of the highest ion scores, which matched the peptides distinctly, are summed to represent the protein score. It should be noted here that the Mascot score is based on the calculated probability, therefore the total number of matches between the experimental peaks and theoretical ions is not a key factor in calculating the ion score. This implies that the small number of peaks with a strong intensity from a spectral list may deduce or optimize a higher ion score. The ion coverage from a peak list has been expressed in TP/FP rate curves or precision study given above. The following results reveal how the peak lists performed in finding the expected protein matches by using a general protein identification method.

Table 3 lists the detail of the Mascot score and how the MS/MS spectra in sample BSA3 match the peptides in the BSA sequence from the database. In this result, 22 spectra were used to identify 18 peptides from the BSA sequence, which reached protein sequence coverage of 33%; while peak lists from Distiller only identified 10 peptides by 12 spectra with 16% protein sequence coverage. This is a particular example that demonstrates that protein identification using peak lists from MWD can more easily provide reliable results, but it is common to find similar result to this example in the test data and deduce higher protein scores in database search by using MWD peak lists.

Table 3. The details of ion score obtained by Mascot search using peak lists from two different peak detection tools.

Spectral information			Mascot search score
Well #	Precursor ion (Da)	Peptide	MWD list	Distiller list
17	1072.648	SHCIAEVEK	24	X^b)
23	1443.798	YICDNQDTISSK	68	X
30	1674.062	QEPERNECFLSHK	19	X
35	1927.895	CCAADDKEACFAVEGPK	36	X
40	1554.788	DDPHACYSTVFDK	23	16
43	1305.760	HLVDEPQNLIK	31	X
43	1576.932	LKPDPNTLCDEFK	18	X
44	927.463	YLYEIAR	50	48
45	1640.114	KVPQVSTPTLVEVSR	69	69
46	1512.013	VPQVSTPTLVEVSR	54	X
49	1881.019	RPCFSALTPDETYVPK	41	19
50	927.436	YLYEIAR	[32]^a)	[31]
51	1163.495	LVNELTEFAK	59	46
53	1439.835	RHPEYAVSVLLR	57	30
56	1283.899	HPEYAVSVLLR	47	23
56	1419.878	SLHTLFGDELCK	66	71
59	927.557	YLYEIAR	[23]	X
59	1479.976	LGEYGFQNALIVR	31	X
63	927.698	YLYEIAR	[27]	[20]
63	1419.950	SLHTLFGDELCK	[44]	[12]
70	2045.571	RHPYFYAPELLYYANK	11	X
70	1480.156	LGEYGFQNALIVR	X	17
72	1567.923	DAFLGSFLYEYSR	42	X
		Total score (Coverage)	746 (33%)	338 (16%)

^a) The score in bracket represents duplicate peptide from the search results. ^b) Symbol (X) indicates that the match of peptide was not found from the peak list.

CONCLUSION

A computer program has been developed to detect ion peaks from mass spectra. The method in the program determines the noise level from the data points within a selected mass range. Thus, many parameters, such as those normally required to control peak detection, are not necessary in this program. This method is effective in the selection of ion peaks particularly for spectra with low resolution. The test results have confirmed that this peak detection not only finds high true positive rates for the mass spectra with high quality fragmentation, but also acquires reasonable true/false positive rates in the peak lists of the spectra which do not contain many strong fragment ions. Therefore, a greater number of MS/MS spectra from high-throughput experiments can be used for finding peptide sequences, which leads to higher protein sequence coverage. Although the initial intention was to develop an approach to detect ion peaks in low-resolution spectra, applying this method to mass spectra with higher resolution is also possible.

Acknowledgements

This research is granted by the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program),” initiated by the Council for Science and Technology Policy (CSTP).

REFERENCES

Corresponding author

Register with J-STAGE for free!