Proceedings of the Symposium on Chemoinformatics
28th Symposium on Chemical Information and Computer Sciences, Osaka
Conference information

Oral Session
Sample selection in Partial Least Squares (PLS) by multi-objective Genetic Algorithms (GA)
*Hideyuki ShinzawaTakehiro NakagawaKatsuhiko MaruoYukihiro Ozaki
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages J09

Details
Abstract

A sample selection method in Partial Least Squares (PLS) model building was proposed. This method aims to improve the performance of PLS model by eliminating some uninformative sample(s) with systematic errors from data set. First, data set is divided into three, training, constraint and test set. Then multi-objective GA is performed with training and constraint set to find the samples with systematic errors. The combinations of the samples to be removed from training or constraint set are given as Pareto solutions. Finally, Pareto solutions are evaluated with test set in terms of correlation coefficients and root mean square of standard error of prediction. The proposed method was applied to the near infrared spectra obtained from the surface of a human skin. The result showed one of the Pareto solutions improves the performance of the PLS model remarkably. It was also used to analyze the effect of systematic errosr in detail. It showed the scatter and random noise in the original data associate with the systematic errors. These results indicate the proposed method is a useful tool not only to improve the performance of the PLS model but also to analyze the effect of systematic errors.

Content from these authors
© 2005 The Chemical Society of Japan
Previous article Next article
feedback
Top