The present paper discusses principles and methods of sensitivity analysis in multivariate methods such as principal component analysis and correspondence analysis, which are formulated as eigenvalue problems, and in various methods of exploratory factor analysis, which containeigenvalue problems in their determinating equations.Major mathematical tools are influence function introduced by Hampel(1974)and perturbation theory of eigenvalue problems.Theoretical influence functions, empirical influence functions and deleted empirical influence functions are used for detecting influential observations in those multivariate methods. Numerical examples are shown for illustration.
The purpose of this paper is to review the problem of evaluating whether a given set of variables is redundant, in the presence of the rest variables. The redundancy problem is mainly discussed in discriminant analysis, canonical correlation analysis and principal component analysis. We will give some equivalent formulations for redundancy of a variable subset, which are appropriate for intutive interpretaion, inferential structure and mathematical treatment, respectively. The LR test and the AIC criterion for redundancy of a variable subset are also given. We give a brief discussion for the redundancy problem in other multivariate models and some topics related to the redundancy problem.
Contrained principal component analysis(CPCA)was proposed by Takane and Shibayama(1991)for structural analysis of multivariate data. In this method the data are first decomposed into several components according to external information. The decomposed submatrices are then subjected to principal component analysis(PCA)to explore possible structures within the submatrices. The method thus combines two major conventional multivariate analysis techniques, multiple regression analysis and PCA, in a unified framework. This paper illustrates the basic model, computational methods, various uses and extensions of CPCA. An illustrative example is given, and relative merits and demerits of CPCA are discussed in relation to the analysis of covariance structure(ACOVS)approach.
In the last decade, the field of multivariate analysis has shown remarkable development. However, some of the method could not satisfy users' needs for getting a tough and better handling tool. The purpose of this article is to make clear the contents of users' needs and communicate them to makers and developers for their information. 82 subjects(mostly social psychologists)were asked by questionnaire method what kinds of dissatisfaction they experienced in using multivariate analyses. The questionnaire considered the10methods: factor analysis, principal component analysis, canonical correlation analysis, multiple regression analysis, analysis of covariance, discriminant analysis, cluster analysis, theory of quantification, LISREL, and MDS. Users reported wanting methods which are:(1)less mathematically sophisticated than tough and stable, (2)able to analyze a formation and change of interacting processes, (3)not restricted by excessive mathematical assumptions, (4)supported by systems which easily select the best method out of a complex program package. The users also wanted to know(5)the limitations of each application rather than the usefulness of the method, and(6)criteria for selecting the most suitable method for the data.
This paper introduces a computationally useful aspect of Mandarin reveled by statistical analysis of the6321most frequently used Chinese words of Suen(1986). The statistics extractherein include;(1)frequency distribution of consonants, vowels, phonemes and tones, (2)word-length count of syllables and phonemes, (3)entropies and primary as well as secondary conditional entropies of phonemes, (4)frequency distribution of short-distance words based on consonants, vowels, and/or phonemes, (5)substitution pairs of consonants, vowels and phonemes.These statistical properties provide useful information of fundamental importance in computer processing of the Chinese language. For example, an error-correcting scheme for a single Chinese character, even of known tone and part of speech, is difficult because the number of Chinese characters having Levenshtein distance of1averages10.38per word. But if we consider two-Chinese-character-word, the average number of words having the same Levenshtein distance of1reduces to3. 19per word, without taking the tone and parts of speech into account. This can now be drastically reduced to0. 26per word if such linguistic information in fully utilized. We believe that effective use of the statistical properities of the language thus extracted should be more fully explored in implementing an efficient error-correcting scheme in the machine processing of Mandarin.
The incidence of habitual aborters and the abortion rates of their next untreated pregnancy are difficult to estimate because the habitual aborters(those who have aborted more than three times successively)rarely continue their pregnancy without being treated. This information is indispensable to obstetricians who treat the patients with a new treatment and evaluate its efficacy. The method of moments and the maximum likelihood method are used to estimate these parameters based on a simple model about the habitual abortions. The abortion rates of the third and fourth pregnancy of habitual aborters with three consecutive abortions were0.306and0.313respectively according to thetwo-causes group model, which are smaller than the observed rates.The abortion rate of the fifth pregnancy was 0.314according to the two-auses group model and0.472according to the three-causes group model.
Hayashi's third method of quantification was presented by C. Hayashi in1956as a fore runner of correspondence analysis developed in France. Many artificial data sets for it have been proposed by several authors, Guttman, Iwatsubo and Otsu among others. Almost all of them have the one-dimensional structure with the circular data dealt with by Iwatsubo as the unique exception as far as the author is aware. This paper has two objectives. First, the author presents a two-parameter family of the artificial data sets with the one-dimensional structure which includes two data sets due to Otsu, Case2and Case3, and the Guttman data as special cases. For almost all artificial data sets with the one-dimensional structure, the second axis is interpreted as Guttman's intensity which is anartifact without any substantial meaning. As the second objective, item pattern with the twodimensional structure is presented. Two types are dealt with, square type and triangular type. For the both types, the first axis is interpreted as the length of the two-dimensional structure, while the number of the axis at which the second dimension, the width, appears is not necessarily two but depends indeed on the value of the length.