Abstract
Partial least squares (PLS) has been widely used in chemometrics studies for the regression and classification of high-dimensional data, especially n (number of samples) << p (number of variables) type data. Canonical correlation analysis (CCA) is a classic method of multivariate analysis. However, it has rarely been applied to multivariate regression in chemometrics studies. The main reason is that, CCA cannot be applied for n<<p type data. So we applied regularized CCA (RCCA), which will add l2-norm penalty to CCA. In this study, we formulated PLS, CCA, RCCA, Kernel PLS and Kernel CCA by using generalized eigenvalue problem. And we applied PLS and CCA to two toy problems to clarify the features of these methods. Finally we applied PLS and RCCA to GC-MS data, which is analyzed to resolve the problems of Japanese green tea ranking. We found that, to construct a regression model, the optimal number of latent variables determined by leave-one-out cross-validation (LOOCV) was significantly fewer in RCCA than in PLS.