Journal of the Japanese Society of Computational Statistics
Online ISSN : 1881-1337
Print ISSN : 0915-2350
ISSN-L : 0915-2350
Volume 15, Issue 2
Displaying 1-35 of 35 articles from this issue
  • THE ROLE OF STATISTICS IN PHARMACOGENOMICS
    Sandra Close Kirkwood
    2003 Volume 15 Issue 2 Pages 3-13
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    The sequence of the human genome released earlier this year is a significant scientific achievement. The goal of pharmacogenomics is the application of genetic information and technology to develop better therapeutics or to guide the use of pharmaceuticals in the treatment or prevention of disease. Statistical theory and probability will play an expanded role in interpreting genetic information through the development of new analytical methodology and the novel application of traditional statistical theory. Examples in gene mapping and microarray expression analysis will be used to broadly illustrate the essential role of statistical theory in pharmacogenomics research. Specific gene mapping methodologies discussed include linkage analysis, linkage disequilibrium studies, and haplotype analysis. Application of statistical theory to gene chip experiments to obtain high-quality data including experimental design, minimizing variability, and well-controlled verification strategies and applications to identify gene expression differences between experimental groups will be reviewed. The combination of statistical applications and genomic technologies is key to understanding the genetic differences that identify patients susceptible to disease, stratify patients by clinical outcome, indicate treatment response, or predict adverse event occurrences.
    Download PDF (1643K)
  • Wing K. Fung
    2003 Volume 15 Issue 2 Pages 15-26
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    DNA profiling is a highly discriminating method for human identification. In this paper, the background of forensic DNA analysis is introduced. Problems on computational statistics such as Hardy-Weinberg and linkage equilibria, representativeness of population databases, and population substructure and relatedness etc. are addressed. In particular, two problems are discussed in more details, namely the paternity determination and the statistical evaluation of complex forensic DNA mixtures. Some general procedures are suggested for handling the problems. Problems which remain to be solved will also be discussed.
    Download PDF (1410K)
  • Byron Wm. Brown, Jr., Jerry Halpern
    2003 Volume 15 Issue 2 Pages 27-35
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Implicit assumptions in the current use of the Neyman-Pearson approach to sample size calculations when planning for medical clinical trials are noted. The pros and cons of using decision theory instead of or to supplement Neyman-Pearson when planning such trials are discussed. Why the sample sizes suggested by these two approaches are often very different is explained. A computer program is presented and described for the two arm completely randomized trial with a binary endpoint. This program is available from the WEB, requires simple practical input, and is intended to be easily used by the clinical scientist and biostatistician. An example of its use is given which illustrates the additional quantitative insights afforded by using decision theory together with the more customary Neyman-Pearson approach to design clinical trials.
    Download PDF (1309K)
  • ENABLING CLINICALLY BASED KNOWLEDGE DISCOVERY IN PHARMACY CLAIMS DATA: AN APPLICATION IN BIOINFORMATICS
    Jason Jones, Doug Stahl, Michael Nichol, Stanley P. Azen
    2003 Volume 15 Issue 2 Pages 39-47
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    This paper describes the development, application, and evaluation of a set of methods for transforming standard pharmacy claims data into a clinically relevant database that can facilitate healthcare research. Prescription claims data represent relatively inexpensive and largely unexploited exploratory ground for understanding the relationships between prescription treatments and their healthcare and cost outcomes.
    A web-based, graphical interface was developed to solicit clinical expert opinions about how claims should be combined into prescription treatments. A classification tree methodology was then applied to the database in an attempt to induce expert decisions based on a flexible set of predictor variables generated directly from the prescription claims.
    Two different classification tree approaches and four versions of the predictor variable sets (PVSs) were compared with each other and with a fixed heuristic for data transformation in a sample of 11, 654 expert reviewed claim pairs. The model-based classification rules significantly outperformed the simple rule when claim pairs were comprised of different drugs and performed as well as the simple rule when the drugs were the same.
    The best combination of classification tree approach and PVS was used to generate a set of rules that was subsequently applied to a larger dataset and used to generate and describe prescription treatment episodes. A sample analysis was conducted using the output database to specify inclusion/exclusion criteria, group assignment, stratification, and outcomes such as treatment discontinuation. Both visual and formal techniques were used in a way that would be commonly used in an outcomes or pharmacoeconomic research endeavor.
    Download PDF (1196K)
  • STATISTICAL ISSUES IN THE DRUG EVALUATION PROSESS : SOME ISSUES IN APPLYING THE ICH GUIDELINES
    Tatsuya Isomura, Toshimitsu Hamasaki, Masashi Goto
    2003 Volume 15 Issue 2 Pages 49-64
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    The International Conference on Harmonization (ICH) has recently developed the ICH guidelines, which are the common guidelines for regulatory requirements in the ICH regions, i.e. Japan, and the United States (US) and Europe. The guidelines have been implemented in the three regions, and the people involved in drug evaluation in the pharmaceutical industry, such as statisticians and pharmaceutical scientists, have referred the guidelines to their daily work when they perform the drug evaluation process. This paper aims to discuss some statistical issues in the drug evaluation process that are commonly encountered when applying the ICH guidelines. It also provides a few practical ideas for the future drug evaluation process. In this paper the emphasis is placed on the following ICH guidelines: “Dose-Response Information to Support Drug Registration” (ICH-E4), “Ethnic Factors in the Acceptability of Foreign Clinical Data” (ICH-E5) and “Choice of Control Group in Clinical Trials” (ICH-E10).
    Download PDF (2268K)
  • STOCHASTIC PROCESS MODEL FOR MEDICAL DECISION-MAKING
    Takashi Namatame, Yumi Asahi
    2003 Volume 15 Issue 2 Pages 65-70
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In this paper, we describe various stochastic process models, mainly the Markov model, for medical decision-making. Particular attention is given to modeling, the Markov cohort model and the Markov decision process model for computation and limitation in stochastic models. We also discuss the current problems: how to collect the required data and the relation between the model solution and the actual decision.
    Download PDF (1912K)
  • ANALYSING LONGITUDINAL CLAIMS COST DATA : A ZERO-AUGMENTED GAMMA MIXED REGRESSION APPROACH
    Andy H. Lee, Kelvin K. W. Yau
    2003 Volume 15 Issue 2 Pages 73-79
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    This paper presents a zero-augmented gamma mixed regression model to analyse longitudinal data with many zeros. The objective of occupational health is to reduce the injury incidence and the mean claims cost once injured. However, the population-based claims cost data often contain many zero observations (no claim). The empirical distribution thus comprises a point mass at zero mixed with a non-degenerate parametric component. The likelihood function can be factorised into two orthogonal components, corresponding to the effects of covariates on the claim incidence and the magnitude of claims, conditional on claims being made. Random effects are incorporated into the respective linear predictor to account for correlation between observations from the same individual. The mixed regression model is applied to evaluate the effectiveness of an occupational intervention program.
    Download PDF (713K)
  • ROBUST ANALYSIS OF LONGITUDINAL DATA
    Bo Fu, Wing K. Fung, Xuming He
    2003 Volume 15 Issue 2 Pages 81-87
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Two robust methods for the analysis of longitudinal data are discussed. The marginal linear model with correlated observations within individuals is employed. We summarize literature on robust methods and suggest the modifications of Huggins' and Jung's estimates to simplify the procedure for the analysis of longitudinal data. The small sample behaviours of the two estimates are investigated under various situations by simulation. A real data set is analysed. This paper provides a useful reference for practitioners to the choice of robust methods in the analysis of longitudinal data.
    Download PDF (762K)
  • PARTIAL LINEAR MODELS WITH HETEROSCEDASTIC VARIANCES
    Hua Liang, Wolfgang Härdle
    2003 Volume 15 Issue 2 Pages 89-104
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Consider the partial linear heteroscedastic model Yi=XTiβ+g(Ti)iei, 1≤in with random variables (Xi, Ti) and response variables Yi and unknown regression function g(·). We assume that the errors are heteroscedastic, i.e., σ2i≠const. ei are i.i.d. random errors with mean zero and variance 1. In this partial linear heteroscedastic model, we consider the situations that the variance is an unknown smooth function of exogenous variables, or of nonlinear variables Ti, or of the mean response XTiβ+g(Ti). Under the general assumptions, we construct an estimator of the regression parameter vector β which is asymptotically equivalent to the weighted least squares estimators with known variance. In procedure of constructing the estimators, the technique of splitting-sample is adopted.
    Download PDF (1363K)
  • NONPARAMETRIC AND SEMIPARAMETRIC MODELS IN COMPARISON OF OBSERVATIONS OF A PARTICLE-SIZE DISTRIBUTION
    Walter Liggett
    2003 Volume 15 Issue 2 Pages 105-122
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Testing hypotheses about pairs of unnormalized histograms motivates this paper. The histograms contain particle counts for particle-size intervals. The analysis involves generalized-linear-model fitting of cubic splines with irregularly-spaced knots. Of interest is testing the null hypothesis that two sets of particle counts correspond to intensity functions that differ only by a scale factor and a constant shift in horizontal registration. An unknown smooth function is common to the two intensities. The alternative hypothesis is that in addition, the difference between the two intensities is also an unknown smooth function. We consider three approaches to knot placement. First is specification of so many knots that adequate representations of the unknown functions cannot be doubted. Second is data-driven choice of knots. Third is choice of knots based on prior knowledge of what intensity differences are plausible. For the data at hand, we show that specification of too many knots leads to tests with too little power and that data-driven knot selection can lead to false rejection of the null hypothesis. The data at hand seem to call for use of prior knowledge to construct a semiparametric model that incorporates the distinction between the two hypotheses in the parametric part.
    Download PDF (1679K)
  • RIP-GAMS AND CLASSIFICATION TREES IN QUANTITATIVE MRI
    Michael G. Schimek
    2003 Volume 15 Issue 2 Pages 123-134
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Our interest is the analysis of quantitative magnetic resonance image (MRI) data, most relevant in current human brain research. When generalized additive models (GAM) are fitted to such data, the backfitting algorithm of S-Plus tends to fail due to serial correlation and concurvity. To accommodate for condition problems of the system matrix we introduce the new concept of relaxed iterative projection generalized additive models (RIP-GAM). While the RIP algorithm (also in S-Plus) does not seem to run into numerical troubles for our data set, backfitting has slow or no convergence in some instances. In standard situations, however, both procedures give the same estimation results.
    Because little is known about the functional relationships between the quantitative MRI parameters such as mean diffusivity, magnetization transfer ratio or forward transfer rate and qualitative lesion-related (binary) variables from clinical MRI diagnostics, more exploratory evidence is required. Hence we fit GAMs and when necessary RIP-GAMs. In addition we apply classification trees for the validation of the selected variables. Even for a simple one-step lookahead procedure we obtain stable results which support the fitted GAMs. In conclusion, both nonparametric techniques are valuable tools for quantitative MRI research.
    Download PDF (1183K)
  • POWER-TRANSFORMATION MIXED EFFECTS MODEL WITH APPLICATIONS TO PHARMACOKINETICS
    Takashi Daimon, Masashi Goto
    2003 Volume 15 Issue 2 Pages 135-150
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Pharmacokinetics is the study of the time course of disposition of a drug in the human body of an individual or a group of individuals on the basis of blood drug concentration data for each individual. In this paper, intending to characterize the pharmacokinetics of the group of individuals in aggregate, we proposed the power-transformation mixed effects model, and evaluated its performance by an example from the pharmacokinetic literature and a simulation experiment. In the investigation of the literature example, we compared the performance of the power-transformation mixed effects model with that of the no-transformation mixed effects model. As a consequence, the power-transformation mixed effects model could deal with the heteroscedasticity of the observed drug concentration appropriately. In the simulation experiment, we evaluated the performance of the estimations of the fixed effects parameters and the elements in the variance-covariance matrix of the random effects in the power-transformation mixed effects model and the no-transformation mixed effects model, in the setting of a situation where the blood drug concentration data have heteroscedasticity. We evaluated the effects of the number of individuals, the type of selection of the sampling times of the blood drug concentrations, and the variability of the error distribution on the estimations of the parameters in both models. As a result, the power-transformation mixed effects model was very useful as an approach to pharmacokinetics.
    Download PDF (1755K)
  • COMPUTATIONAL ISSUES IN INFORMATION-BASED GROUP SEQUENTIAL CLINICAL TRIALS
    KyungMann Kim, Anastasios A. Tsiatis, Cyrus R. Mehta
    2003 Volume 15 Issue 2 Pages 153-167
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Lan and DeMets (Biometrika 1983; 70: 659-663) introduced a flexible procedure for monitoring of group sequential clinical trials based on the discretization of the Brownian motion process. Subsequently Kim and DeMets (Biometrika 1987; 74: 149-154) developed a general procedure for design of such clinical trials. A number of procedures have been proposed for statistical inference following group sequential tests regarding the P-values and the point and confidence interval estimation of the parameter of interest such as the effect size or the treatment difference in such clinical trials. In this article, computational issues are described for design and monitoring of clinical trials with interim analysis based on group sequential methods for possible early stopping for efficacy or safety and for inference following early stopping of group sequential clinical trials. The computational procedures as implemented in a commercial package EaSt (2000) are illustrated with an example of a lung cancer clinical trial
    Download PDF (8498K)
  • A STUDY OF THE SEQUENTIAL CONDITIONAL TEST FOR CONTINGENCY TABLES
    Toshio Sakata, Ryuichi Sawae
    2003 Volume 15 Issue 2 Pages 169-174
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    By using a program generating tables with fixed marginals via creation operators, we implemented the sequential conditional test for contingency tables and studied its performance, that is, mean sample size, significance level and power for 2×l tables, l=3, 4, 5.
    Download PDF (646K)
  • INVARIANT CONFIDENCE SEQUENCES FOR SLIPPAGE OF NORMAL MEANS
    Takahiko Hara
    2003 Volume 15 Issue 2 Pages 175-179
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In this paper, we consider a mean-slippage detecting problem in univariate normal populations. By using Robbins' inequality, we construct some invariant confidence sequences for slippage of normal means when the common variance is unknown.
    Download PDF (329K)
  • EXTENSIBILITIES OF A JAVA-BASED STATISTICAL SYSTEM
    Ikunori Kobayashi, Takeshi Fujiwara, Junji Nakano, Yoshikazu Yamamoto
    2003 Volume 15 Issue 2 Pages 183-192
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    A statistical system needs to be able to work with other systems in a flexible way and be easily extensible, because no one statistical system can implement all the features required by a wide variety of users. Recently, new ways of extending statistical systems have been brought about by the rapid development of software technologies, including the Java language. In this paper, we propose the use of several Java language techniques for realizing extensions of Jasp, a Java-based statistical system, and demonstrate three practical features of these extensions.
    The first practical feature is demonstrated in the ability to handle eXtensible Markup Language (XML) documents, which are the standard format for data transmission on the WWW. The second practical feature is the use of the Component Object Model (COM) interface of Microsoft Windows for communication, e.g., with Microsoft Excel. The third practical feature is the sharing of program resources from other statistical software products, and to perform this task we have extended Jasp by using a translator to execute functions written in the XploRe language.
    Download PDF (3854K)
  • EVALUATION OF EXECUTION TIME ON DATA ANALYSIS WITH PARALLEL VIRTUAL MACHINE
    Hiroyuki Minami, Yuriko Komiya, Masahiro Mizuta
    2003 Volume 15 Issue 2 Pages 193-199
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    We have to analyze enormous data in many cases. A personal computer can handle them, however, it would take a lot of time even if today's personal computer would have good specifications. Anyway, we have to seek a faster analysis environment. A parallel computer which has large computing power will satisfy us.
    Parallel Virtual Machine (PVM) is one of the popular computer libraries to make many computers, connected via computer network, one (virtual) parallel one. If we could use thousands of connected computers concurrently, we would analyze various data quickly with PVM.
    We have inve s tigated PVM features through many simulations and found some interesting ones. Accordingly we construct a generic experimental model of execution time in PVM. This model is applicable for most methods on data analysis which can be implemented with master-slave style, in other words, which can be divided into one main part and some sub parts.
    In terms of this model, we evaluate turn-around time, related to amount of transferred data, load (described by execution time) on each slave computer and number of (part-) jobs. Our model is so generic that we can estimate execution time for such analysis methods as Bootstrap, κ-means, etc. We can also derive how many computers are required if we analyze data in time.
    In this paper, we summarize our work with numerical examples and discuss some points to use our framework in practice.
    Download PDF (741K)
  • NEIGHBORHOOD GRAPHS IN CLASSIFICATION PROBLEMS FOR SYMBOLIC DATA
    Manabu Ichino
    2003 Volume 15 Issue 2 Pages 203-216
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    This paper presents new neighborhood graphs useful to solve feature selection problems in pattern recognition for symbolic data. In pattern recognition for symbolic data, each sample pattern is described not only by quantitative features but also by qualitative features. We introduce the Cartesian System Model (CSM) as a mathematical model to treat symbolic data. Then, we define the Generality Ordered Mutual Neighborhood Graph and the Generality Ordered Interclass Mutual Neighborhood Graph based on the CSM. These neighborhood graphs play central roles in seeing details of the interclass structures. We outline the basic idea of our classifier by using simple examples.
    Download PDF (2111K)
  • CLUSTERING ALGORITHMS AND KOHONEN MAPS FOR SYMBOLIC DATA
    Hans-Hermann Bock
    2003 Volume 15 Issue 2 Pages 217-229
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    This paper considers ‘symbolic’ data tables where variables take, as ‘values’, intervals, sets of categories, histograms etc. instead of single numbers or categories. After presenting some cases where this situation may occur, we concentrate on interval-type data and present methods for partitioning the underlying set of objects (rows of the data matrix) into a given number of homogeneous clusters. Our clustering strategies are typically based on a clustering criterion and generalize similar approaches in classical cluster analysis. Such methods are part of a general Symbolic Data Analysis described, e.g., in Bock and Diday (2000). Finally, we present a sequential clustering and updating strategy for constructing a Self-Organizing Map (SOM, Kohonen map) for visualizing symbolic interval-type data.
    Download PDF (1895K)
  • HIERARCHICAL AND PYRAMIDAL CLUSTERING FOR SYMBOLIC DATA
    Paula Brito
    2003 Volume 15 Issue 2 Pages 231-244
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    This paper presents a method for clustering a set of symbolic data where individuals are described by symbolic variables of various types: interval, categorical multi-valued or modal variables, which take into account the variability or uncertainty present in the data. Hierarchical and pyramidal clustering models are considered. The constructed clusters correspond to concepts, that is, they are maximal sets of individuals associated with a conjunction of properties relating to the variables such that they form necessary and sufficient conditions for cluster membership. More generally, the data may include hierarchical rules between variables as well.
    Download PDF (1563K)
  • BOOTSTRAP CALIBRATION AND EMPIRICAL LIKELIHOOD IN THE LOGISTIC REGRESSION MODEL
    Michele La Rocca
    2003 Volume 15 Issue 2 Pages 247-254
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In this paper we introduce a bootstrap approximation for the sampling distribution of the empirical likelihood ratio statistic in the logistic regression model. Both classical and robust inference procedures are considered. Some results of a Monte Carlo experiment illustrate the effectiveness of the proposed approach.
    Download PDF (981K)
  • THE PERFORMANCE OF COMPUTER INTENSIVE METHODS FOR OVER-DISPERSED CATEGORICAL DATA
    Yoshimichi Ochi
    2003 Volume 15 Issue 2 Pages 255-264
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In order to deal with over-dispersed categorical data, several methods have been proposed. Those methods include parametric model extensions, quasi-likelihood methods, generalized estimating equations, as well as nonparametric approaches. In this paper, applications of the computer intensive methods, such as the jackknife method and the bootstrap method, are considered. The methods considered here assume the baseline models to be original ones that are used under the multinomial distribution assumption. Then, for each resampling dataset, the maximum likelihood estimates under the assumption are obtained. The purpose of this paper is to evaluate the performance of the estimators of the regression parameters and their variances and to compare it with other procedures. We review the approaches and show the results of a simulation study.
    Download PDF (1200K)
  • MEASURES OF VARIATION EXPLAINED BY BINARY REGRESSION
    Norisuke Kawai, Masashi Goto
    2003 Volume 15 Issue 2 Pages 265-274
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In binary regression models we are interested in not only the parameter estimates and significance of explanatory variables, but also the degree to which variation in the response variable can be explained by explanatory variables. In this paper, we compare the behavior of proposed measures of explained variation for binary regression models through several case studies and indicate which measures should be accepted in practice. Furthermore, the importance of distinguishing measures of explained variation and goodness-of-fit is discussed. In conclusion, we recommend routine evaluation of the measures of explained variation in binary regression together with an exhaustive model which allows us to test the adequacy of simpler models such as the logistic model.
    Download PDF (1114K)
  • RANK ASSOCIATION MEASURES FOR CONTINGENCY TABLES
    Osamu Sugano
    2003 Volume 15 Issue 2 Pages 275-280
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    We propose the two association measures for the contingency tables using rank. We illustrate the comparison between the new measures and the traditional measures. Then the rank association measures make it easy to find the detailed differences between two r×s tables. Thus, the merit of using rank statistics is illustrated by the examples.
    Download PDF (521K)
  • STATISTICAL EVALUATION OF UMBRELLA DOSE-RESPONSE RELATIONSHIPS
    Mitsumasa Baba, Masaki Fujisawa, Wataru Sakamoto, Masashi Goto
    2003 Volume 15 Issue 2 Pages 281-293
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In the drug development process, it is essential to assess the relationship (mechanism) between dose and response of a biological system following drug administration. This mechanism is known as the “dose-response relationship”. Then, dose-response relationships are often evaluated based on a monotonic hypothesis. However, in practice, we may often encounter non-monotonic dose-response relationships, such as the umbrella relationship, which then makes interpretation of the relationships somewhat problematic. In this paper, to assess such umbrella dose-response relationships, the cumulative dose logit model is proposed and applied to an example. To evaluate properties of this model, some Monte-Carlo studies are performed. The results of the cumulative dose logit model are compared with those of the quadratic logit model. This indicates that the cumulative dose logic model provides more stable estimates than the quadratic logit model in estimating the maximum effective dose. It is suggested that the cumulative dose logic model is appropriate for assessing non-monotonic dose-response relationships.
    Download PDF (1535K)
  • VGAM FAMILY FUNCTIONS FOR CATEGORICAL AND GENETIC DATA
    Thomas W. Yee
    2003 Volume 15 Issue 2 Pages 295-304
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Vector generalized additive models (VGAMs) are a multivariate extension of generalized additive models (GAMS). VGAMs extend the GAM class mainly in two respects: they handle more than a single additive predictor, and consequently fit models outside the exponential family. VGAMs provide a large framework; they give maximum likelihood estimates for a wide range of data types and models such as univariate and multivariate distributions, categorical data analysis, time series, survival analysis, generalized estimating equations, correlated binary data, bioassay data and nonlinear least-squares problems. In this paper we briefly survey vector generalized linear models (VGLMs), VGAMs and an S-PLUS/R implementation called VGAM written by the author for general maximum likelihood estimation. Then we focus on specific family functions for categorical and genetic data, for example, the proportional odds, continuation ratio, adjacent categories and stereotype models for categorical data. The VGAM software library is freely available at http://www.stat.auckland.ac.nz/yee.
    Download PDF (1021K)
  • GEOGRAPHICALLY WEIGHTED FUNCTIONAL MULTIPLE REGRESSION ANALYSIS: A NUMERICAL INVESTIGATION
    Yoshihiro Yamanishi, Yutaka Tanaka
    2003 Volume 15 Issue 2 Pages 307-317
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Functional regression analysis enables us to investigate the relationship among variables over time. Sometimes, however, we meet the case where regression coefficients do not remain fixed over space, when we analyze spatial data. The present paper proposes a method of geographically weighted functional regression analysis to analyze the relationship among variables which varies over space as well as over time, borrowing the idea of Brunsdon et al. (1998) in which geographical weight is considered in ordinary regression. Monte Carlo and bootstrap methods are used to perform the statistical test for spatial variability and to evaluate the reliability of the prediction. The proposed methods are illustrated using a real data set.
    Download PDF (2050K)
  • DISSIMILARITY AND RELATED METHODS FOR FUNCTIONAL DATA
    Shuichi Tokushige, Koichi Inada, Hiroshi Yadohisa
    2003 Volume 15 Issue 2 Pages 319-326
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Functional data analysis, as proposed by Ramsay (1982), has been attracting many researchers. The most popular approach in recent studies of functional data has been to extend the statistical methods for usual data to functional data. Ramsay and Silverman (1997), for example, proposed regression analysis, principal component analysis, canonical correlation analysis, linear models, etc. for functional data. In this paper, we propose several dissimilarities of functional data. We discuss comparison of these dissimilarities by using the cophenetic correlation coefficient and the sum of squares. Our concern is the effect of dissimilarity on the result of analysis that is applied to dissimilarity data; e. g., cluster analysis.
    Download PDF (719K)
  • MULTIDIMENSIONAL SCALING FOR DISSIMILARITY FUNCTIONS WITH CONTINUOUS ARGUMENT
    Masahiro Mizuta
    2003 Volume 15 Issue 2 Pages 327-333
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In this paper, a method of Multidimensional Scaling (MDS) for dissimilarity functions with continuous argument is discussed. MDS is one of the important methods for data analysis. Most conventional MDS methods suppose that dissimilarities are real values. Nowadays, the types of data set dealt with in data analysis are extended. Ramsay and Silverman proposed the concept of Functional Data Analysis (FDA). FDA deals with functional data or with data as functional data. When dissimilarity data among n objects are given dependent on a variable t, we would like to use methods of MDS of functional version; the aim of the method is to derive functional configuration X(t) that represents the dissimilarity functional data. A method of MDS for dissimilarity functions with discrete argument is also discussed, because most dissimilarity functions are given by discrete values in view of implementation on computer.
    Download PDF (576K)
  • COMPUTER INTENSIVE TRIALS TO DETERMINE THE NUMBER OF VARIABLES IN PCA
    Masaya Iizuka, Yuichi Mori, Tomoyuki Tarumi, Yutaka Tanaka
    2003 Volume 15 Issue 2 Pages 337-345
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Many criteria and procedures to select a reasonable subset of variables in the context of principal component analysis have been derived, but there still exist problems to determine how many variables should be selected as well as to evaluate the performance of the selection methods. To deal with these problems, two computer intensive methods are performed: a bootstrap method which is applied to the given subsets of variables and a cross validation method which is modified for principal component analysis. The results in some numerical examples offer information and some guidance to determine the number of variables to be selected.
    Download PDF (923K)
  • SENSITIVITY ANALYSIS IN LATENT CLASS ANALYSIS
    Tsukio Morita, Yutaka Tanaka
    2003 Volume 15 Issue 2 Pages 347-355
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    The present paper proposes a procedure for evaluating the stability of Green's solution of latent class analysis from the viewpoint of sensitivity analysis. For this purpose, the first order differential coefficients are derived for the quantities contained in the solution with respect to a perturbation parameter. A numerical example is given for illustration.
    Download PDF (769K)
  • MIXED DATA TYPE AND TOPOLOGICAL CLASSIFICATION
    Karl-Ernst Biebler, Bernd Jäger, Michael Wodny, Elke Below
    2003 Volume 15 Issue 2 Pages 357-360
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    The analysis of high dimensional data containing both continuous and discrete variables is a standard task in applied biometry. Statistical software packages offer classification procedures concerning continuous variables basing on a suitable coordinate transformation of the finite dimensional real data space. Such transformations are of algebraic-topologic nature. Statistical interpretations require that additional suppositions are fulfilled on probability distributions. For discrete variables, nonprobabilistic classification procedures are available from certain metrics. We discuss a classification procedure for mixed binary and continuous type data. TANIMOTO and MAHALANOBIS distance are combined for this purpose. The computations are carried out in a SAS environment. For example, the method is applied to data of alcoholics in traffic.
    Download PDF (322K)
  • A METHOD TO DECIDE THE DIMENSION OF DATA BY THE MDL CRITERION
    Tomoya Tokairin, Yoshiharu Sato
    2003 Volume 15 Issue 2 Pages 361-368
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    In a noisy condition, we apply the shrinkage method in order to remove some characteristic roots of the sample covariance matrix which are smaller than the optimal threshold level, and we also apply the Approximate-Minimum-Description-Length (AMDL) criterion to decide this optimal threshold level. Since the characteristic roots which are smaller than the optimal threshold level are regarded as the noise compo- nents of the data, one obtains the significant characteristic roots by removing them. In other words, one determines the intrinsic and true dimension of data. In this paper we assume the sample covariance matrix has the Wishart distribution so that the limiting joint distribution of the characteristic roots of the sample covariance matrix can be simply obtained. Moreover, we show some numerical examples which compare this method with the conventional methods.
    Download PDF (666K)
  • Yutaka Tanaka, Yoshimichi Ochi
    2003 Volume 15 Issue 2 Pages 369
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Download PDF (126K)
  • 2003 Volume 15 Issue 2 Pages 372-376
    Published: 2003
    Released on J-STAGE: December 09, 2009
    JOURNAL FREE ACCESS
    Download PDF (431K)
feedback
Top