Abstract
The interpretation of time-of-flight secondary ion mass spectrometry (TOF-SIMS) data of complex samples is often extremely difficult because TOF-SIMS spectra contain extremely rich information. Therefore, multivariate analysis such as principal component analysis (PCA) and multivariate curve resolution (MCR) have been used to interpret TOF-SIMS data. In addition, the development of a practical TOF-SIMS spectral database is also difficult due to complex factors in the ionization mechanism such as matrix effects. In this study, we evaluated the application of machine learning methods to the development of the TOF-SIMS data prediction system including an automatic label creation method. The chemical structures of model samples described by strings based on simplified molecular-input line-entry system (SMILES) were automatically divided and then used as labels for the prediction system for organic, polymer and peptide samples in order to identify unknown materials. As a result, the TOF-SIMS spectrum prediction system using Random Forest, one of the supervised machine learning methods, achieved a high accuracy more than 0.9.