Abstract book of Annual Meeting of the Japan Society of Vacuum and Surface Science
Online ISSN : 2434-8589
Annual Meeting of the Japan Society of Vacuum and Surface Science 2023
Session ID : 2P48
Conference information

November 1, 2023
Development of peptide and lipid SIMS spectrum prediction system using Random Forest and evaluation of RF importance of each label
Mine IwahoriDaisuke HayashiSatoka Aoyagi
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is widely used in various fields because it is a powerful surface analysis technique that can provide 3D molecular imaging and chemical structural information. In general, ToF-SIMS has several hundred to several thousand mass peaks in a spectrum. The identification of the peaks is often difficult because of peak overlapping and matrix effects. Moreover, there are very few databases that show the attributions of these peaks. Therefore, we developed a system to predict ToF-SIMS spectra of unknown organic materials such as peptides[1-3]. Since peptides are composed of 20 different amino acids, unknown peptides could be expressed by using the presence or absence of amino acids as labels for supervised machine learning [2]. In order to predict ToF-SIMS spectra of general organic materials, the labels for a supervised learning method have been improved by applying automatic segmentation of molecular strings with simplified molecular input line entry system (SMILES) notation [3] as an annotation of molecular structures. In this study, ToF-SIMS spectral of peptides, lipids, and their mixture were analyzed using Random Forest (RF), a supervised learning method based on decision trees. ToF-SIMS spectra were converted to numerical data using the peak list used for a former Versailles Project on Advanced Materials and Standards (VAMAS) project on machine learning application to ToF-SIMS spectra [2] containing 4230 mass peaks in the mass range 14-1200 Da, distinguishable between inorganic and organic materials, and then normalized to the total ion count to provide descriptors. The chemical structures of 32 peptide and lipid molecules in the samples were denoted in SMILES strings and then the strings were divided into smaller strings to create modified labels for supervised learning[3]. The number of the structure indicated by a label in a molecule was entered in the label. RF was used to predict the spectral dataset with the modified labels. In addition, the feature importance of RF for each label was obtained to evaluate the effectiveness of the labels. As a result, the prediction by RF for all spectral data yielded a high percentage of correct predictions, suggesting that the prediction is accurate even when lipids are included. The results of importance in RF with a single label showed that the mass peaks associated with the chemical structure set by a label were shown to be of the highest importance for most of the labels. References1) W. Ishikura, K. Takahashi, T. Yamagishi, D. Aoki, K. Fukushima, M. Shiga, S. Aoyagi, J. Surf. Anal., 25(2) 103-114 (2018).2) S. Aoyagi, Y. Fujiwara, A. Takano et al., Analytical Chemistry, 93, 9, 4191-4197 (2021).3) K. Kamochi, M. Inoue, S. Ogawa, S. Aoyagi J. Surf. Anal., 30(1) 15 - 27 (2023).

Content from these authors
© 2023 The Japan Society of Vacuum and Surface Science
Previous article Next article
feedback
Top