In this article, we study the limiting distributions of sample quantiles from a finite population. We give a simple proof of the asymptotic normality of sample quantiles for simple random samples from a finite population by employing the method of Wretman (1978). Some Monte Carlo simulation results are also included.
This article presents flexible methods for modeling censored survival data using penalized smoothing splines when the covariate values change for the duration of the study. The Cox proportional hazards model has been widely used for the analysis of censored survival data. However, a number of theoretical problems with respect to the baseline survival function and the baseline cumulative hazard function remain unsolved. The basic concept considered in the present article is to use generalized additive models (GAM) with B-splines to estimate the survival function without the baseline hazard assumption. The proposed methods are discussed according to the way in which they deal with censored observations, competing risk, and time-dependent covariates. We evaluate the performance of the proposed method for predicting loan default with early payment as competing risk using data from a U.K. financial institution.
In test data analysis, it is very important to clarify the underlying inter-item dependency structure. Once the structure is identified, we can reorder the units in the relevant textbook and gain a deeper understanding of the errors made by students. The asymmetric triangulation scaling technique proposed in this paper is a kind of asymmetric multidimensional scaling technique used for visualizing the inter-item dependency structure in a 3D hemisphere by analyzing the conditional correct response rate matrix.
We propose a Bayesian approach to asymmetric multidimensional scaling (MDS), which incorporates an asymmetric data structure. The asymmetry is represented by the hill-climbing model, which introduces a slope vector that measures the extent of the difficulty in going from one point to another instead of vice versa, in the MDS space. By using Bayesian estimation with Markov chain Monte Carlo algorithm, both point and interval estimation of the parameters become possible,in addition to the many advantages of Bayesian estimation. The asymmetry is evaluated on the basis of the posterior credibility region of the slope vector. A numerical simulation demonstrates that the proposed method is effective for recovering the true parameter values. The proposed method is demonstrated by the analysis of brand-switching data.
Seriation and multidimensional scaling are two techniques aimed at exploring relationships in dominance or proximity data matrices. Rodgers and Thompson (1992) argued that the two approaches can profitably interact in the analysis of asymmetric proximity matrices, and proposed a method that uses seriation to define an empirical ordering of the stimuli, and symmetric multidimensional scaling to scale the two separate triangles of the proximity matrix defined by this ordering (an approach anticipated by Method 3 in Gower (1977)). Following a similar concept, this paper proposes some procedures to explore seriation through asymmetric multidimensional scaling. The paper focuses on skew-symmetric components of a particular class of asymmetric matrices (including e.g. tournament or paired comparison matrices). Two short examples of possible applications are provided to illustrate the procedures, both in the dichotomous and quantitative case.
We propose to use the mixed effect trend vector model for modeling of repeated multinomial choice data in the form of a square contingency table. Such data often shows asymmetries where more people change from category a to b than the other way around. In many cases an investigator has, besides the actual choices of the participants, auxiliary variables that pertain to the subjects under study. Most methodologies for asymmetric data do not take into account such variables. We will show how to incorporate these auxiliary variables into the mixed effects trend vector model and how they can be used to study differential change. The models are illustrated in detail with data from the Dutch parliamentary election studies 2006.
Automatic extraction of term semantic classes is an interesting task in the Information Retrieval field. Models such as Latent Dirichlet Allocation or Probabilistic Semantic Indexing are able to provide the probability that a term belongs to a given semantic class. However such models do not provide neither Euclidean coordinates for terms nor a hierarchical structure to organize the latent semantic classes, which makes difficult to visualize the information under consideration. In this work we propose a hierarchical latent topic extraction method that exploits the information contained in asymmetric term similarity matrices. Our method produces Mercer Gram matrices for terms organized by frequency levels and then hierarchically combines classes belonging to different levels. Euclidean coordinates for terms can be recovered from the proposed kernel matrices. Our proposal also provides explicit conditional probabilities, as the Latent Dirichlet Allocation model does, but avoiding the computational burden usually involved in the iterative step of such probabilistic models. Finally, we analyze a real data base showing the advantages of the new approach.
Brand switching data among 12 margarine brands were analyzed by the asymmetric multidimensional scaling based on the singular value decomposition. A two-dimensional result was adopted as the solution. A configuration based on the left and right singular vectors is given along each dimension. The left singular vector represents an outward tendency of switching from the corresponding brand to the other brands, and the right singular vector represents an inward tendency of being switched to the corresponding brand from the other brands. The configuration along Dimension 1 shows that the three brands with the larger market share compete vigorously with each other. The configuration along Dimension 2 classifies 12 brands into two groups; the brand switching between two groups is small, while that within each group is large.
A brief review is made of a body of extant asymmetric MDS models and methods, given a one-mode, two-way asymmetric square relational data matrix whose elements are similarity or dissimilarity measures between objects, or a special two-mode, three-way asymmetric relational data matrix which is composed of one-mode, two-way asymmetric square relational data matrices, and several open problems are discussed.