Abstract
The parallel processing of statistical data analysis has not been studied much so far. In this paper, We aim at a high speed execution of statistical data analysis on the parallel computer "QCD-PAX". The parallel processing is performed to allocate l/16 of data equally to each processing unit (PU). The way of matrix calcullation in multiple regresion analysis is that we allocate the 1st row to PU [0], the 2nd row to PU[1], and so on. If the dimension of the matrix is more than 16, we allocate the 17th row to PU[O], the 18th row to PU[1], and so on, so that the load of each PU is equal. We evalate the efficiency of parallel processing of basic statistics, multiple regression analysis and principal component analysis. AS the result, in basic statistics calclation, as the nmuber of samples increaSe, We get the better efficiency of parallel processing, independent of the nmber oh variables. We get the effciency of parallel processing over 90% With more than 5000 samples. In multiple regression analysis, either the number of variables or the number of samples increase the efficiency of parallel processing. An 87.1% of effciency of parallel processing was obtained with 32 variables and 10000 samples. In principal component analysis, we could not get the efficiency of parallel processing as the number of variables increase, but we get the efficiency as the number of samples increase. We get 66.5% of efficiency of parallel processing with 16 variables and 10000 samples. We couclude that parallel processing of statistical data analysis for massive data is effectibe.