Starting from data available about (anonymous) visitors of a website, possibilities of extracting information as well as enriching pure browsing patterns with the help of business measures (e-metrics) and detecting possibly negative effects of web robots are sketched as prerequisites for interpretation purposes of the navigational behavior of internet users. Problems with the reconstruction of user navigation paths are explained and different heuristics for path completion are discussed. Additionally, several kinds of features of user navigation paths (e.g., sets, sequences, path fragments) have to be mentioned to prepare for an adequate theoretical background concerning recommender systems that can be used for tasks as different as site personalization, cross-/up-selling, and navigation assistance. A vocabulary to describe different kinds of recommender systems and generic quality measures for system evaluation are formulated. Then, specific recommender systems, especially systems based on frequent path features, are defined and evaluated in a final experiment. In an outlook directions for future research on recommender systems are given.
This paper considers a Bayesian approach for selecting the number of factors in a factor analysis model with continuous and polytomous variables. A procedure for computing the important statistic in model selection, namely the Baves factor, is developed via path sampling. The main computation effort is on simulating observations from the appropriate posterior distribution. This task is done by a hybrid algorithm which combines the Gibbs sampler and the Metropolis-Hastings algorithm. Bayesian estimates of thresholds, factor loadings, unique variances, and latent factor scores as well as their standard errors can be produced as by-products. The empirical performance of the proposed procedure is illustrated by means of a simulation study and a real example.
The entropic analog of the Schwarzian derivative (ESf) is used to explore the consequences of scaling and partitioning operations on time series that are known to be edge-of-chaos, or random. The nonlinear deterministic series may be fractal trajectories, and if so self-similarity may be a property that characterises their evolution. Examples are generated, and statistical methods that follow from their generation are noted. Some invariance properties of the ESf index that are associated with fractal self-similarity but not with other series appear to be a basis for process identification on relatively short time series. Surrogate and parameter optimisation methods are illustrated in real data. A possible relationship between the difference between an ESf and the ESf on its random surrogates. and the first derivative of the local largest Lyapunov exponent, is noted.
The traditional Item Response Theory (IRT) models assume local independence, which is equivalent to the assumption of unidimensionality. This assumption states that a subject's responses to different items in a test are statistically independent. For the assumption to be true, a subject's performance on one item must not affect. either for better or for worse, his or her responses to any other items in the test. The main purpose of this paper is to relax the local independence assumption in the traditional IRT models by extending to a network model. A new IRT model is defined which assumes probabilistic network structures for the assumption of local independence. Another unique feature of the model proposed is that it is a new probabilistic network model with the conditional probability parameters depending on a latent trait variable. Information criteria AIC and BIC are used to evaluate the performance of the model proposed, using actual test data. It shows that the proposed model provides better results than the traditional model. In addition, this paper proposes an item selection criterion from the decision theoretic approach. The amount of test information is defined as the amount of mutual information between a variable for the item and all variables over the test, to maximize the prediction efficiency of the subject's responses. The new item selection method is used to compare the prediction efficiency between the proposed model and the traditional IRT model. The proposed model is shown to be more efficient.
This article gives an overview of statistical analysis with latent variables. Using traditional structural equation modeling as a starting point, it shows how the idea of latent variables captures a wide variety of statistical concepts. including random effects, missing data. sources of variation in hierarchical data. finite mixtures. latent classes. and clusters. These latent variable applications go beyond the traditional latent variable useage in psychometrics with its focus on measurement error and hypothetical constructs measured by multiple indicators. The article argues for the value of integrating statistical and psychometric modeling ideas. Different applications are discussed in a unifying framework that brings together in one general model such different analysis types as factor models, growth curve models, multilevel models, latent class models and discrete-time survival models. Several possible combinations and extensions of these models are made clear due to the unifying framework.