Dimensionality reduction is an important technique as a preprocessing of high-dimensional data. We extended principal component analysis (PCA) by introducing the differential penalty of the latent variables with each class, as smoothed PCA. A nonlinear extension to this method by kernel methods was proposed. We applied it to the data in which the observation is in transition with time and p >> n data.
In the present article, we discuss a flexible method for modeling censored survival data using penalized smoothing splines when the covariates values change for the duration of the study. The Cox proportional hazards model has been widely used for the analysis of treatment and prognostic effects with censored survival data. However, a number of theoretical problems with respect to the baseline survival function and the baseline cumulative hazard function remain unsolved. The basic concept in the present article is to use the logistic regression model and generalized additive models with B-splines to estimate the survival function. The valiant n-fold cross-validationmethod for generalized additive models(GAM) is proposed when selecting the optimum smoothing parameters. The methods are compared with the generalized cross-validation(GCV) method using data from a long-term study of patients with primary biliary cirrhosis (PBC) for the purpose of facilitating the decision as to when to undertake liver transplantation.
This paper shows the status of hospital information system in which all the clinical data are stored: this can be viewed as mapping all the clinical records into cyberspace composed of huge spatiotemporal databases. Thus, spatiotemporal data mining plays an important role in analyzing complex data, which will lead to a new style of hospital management.
We show a method to predict friendship relation among students using attendance time records to lectures. The record system consists of ID-cards of students and card readers which are equipted at lecture rooms and records times when a student attends or leaves a room. The distribution of difference of times recorded among two students depends if they are friends. We show a method to have the probability that a pair of students are friends using their time records and the distribution. We may use the probability to analyize group structure of a class and also to observe characteristics of classes.
In recent years, the mining of frequent subgraph patterns in a set of labeled graph or a large labeled graph has been extensively studied, and some major algorithms to find a complete set of frequent patterns within a limited amount of time have been established. However, to out best knowledge, any methods have not been proposed to find sequential patterns of graphs from a long graph sequence or a set of graph sequences. In this paper, we propose an efficient method to enumerate frequent sequential patterns of graphs by defining a union graph of graphs and operators of graph transformation. Its performance has been evaluated by using artificial dataset, and the efficiency of the approach has been confirmed with respect to the computation time.
We have investigated chemical structure data mining on the basis of graph mining techniques. This paper proposes a quick approach to structure data mining of chemicals using path fragments that both of the terminal atoms are hetero atoms. Those path fragments can be mined from individual molecular graphs by using their topological distance matrices. The path fragment is described with two labeled terminal atoms and the number of bonds in the shortest path. For the explicit representation, it is expressed by a short text string that can be interpreted intuitively and easily. Then they are subject to statistical analysis. To test the usage of our approach, the dopamine agonists of 370 chemicals which interact with different type of receptors (D1, D2 and autoreceptor) were used for a computational experiment. The result suggests, for example, that the fragments of N6O and N7O are frequent features for the D1 agonists, N5O for the D2 agonists, and N3O and N4O for autoreceptor agonists. It is easy for us to understand the features on their chemical structures. We believe that it could be useful for the knowledge discovery in structure-activity studies of chemicals.
Graph isomorphism test is an important and essential task for all methodologies dealing with graphs such as graph mining, graph query, and etc. However, it is a computationally hard problem, and has been a bottleneck of various applications using graphs. In most cases, it needs the cost of O(n!). In this paper, we propose a method to check graph isomorphism using graph spectra and the other graph invariants. A graph spectrum of a graph is ordered eigenvalues of a symmetric matrix representing the graph, and is known to be invariant over any representations of an identical graph. Our method approximately checks graph isomorphism of two graphs by comparing their graph spectra as well as the other graph invariants. This approach could be more efficient compared with existing test because the computational cost of graph spectrum for an n×n symmetric matrix is O(n2). Through experimental evaluations, we will show the graph spectra of the signless Laplacian matrix of a given graph and that of the line graph of the original graph are mutually complementary, and the use of both the spectra could achieve an approximate but highly accurate and highly efficient graph isomorphism test.
This paper describes a parameter identification method of the damping coefficient of pressure regulator with the nonlinear structure by using the particle filter on the nonlinear Gaussian state space model. In the framework of data assimilation, we give a method of the fusion of simulation model and observation data for nonlinear vibration system.
In reinforcement learning, the use of a linear model for value function approximation is promising due to its high scalability to large-scale problems. When we use such a method in practical reinforcement learning problems, how we choose an appropriate model for good approximation is quite important because the approximation performance heavily depends on the choice of the model. In this paper, we propose a new method of model selection with sample data, and we demonstrate the effectiveness of the proposed method in chain walk and inverted pendulum problems.
In this paper, we propose a method that extracts asynchronous patterns from a large sequential data based on a frequency and self-information. Sequential data mining based on frequency has been studied widely, but these methods are not always useful. In several application, a pattern of high frequency is often handled just as meaningless noise. We restrain the noisy patterns by using not only frequency but also self-information. This method is based on the study by Yang et.al. , which purposed to extract well-balanced patterns from the view-point of both frequency and self-information. Additionally, We introduce the sliding window enables to extract asynchronous patterns. In order to make calculation e cient, we introduce more elaborate pruning methods and reorganize search space. Empirical tests demonstrates the e ciency and the usefulness of the proposed method.
This paper proposes an algorithm for relational frequent pattern mining based on logical structure of examples. It uses bottom-up property extraction from examples. The extracted properties construct patterns by two operations. One of the operations combines patterns as their conjunction, and the other makes a pattern as it keeps structure appeared in examples. We show a kind of downward closure property among patterns constructed by the operations. By using the property we give a mining algorithm and show its correctness. We also prove that it does not produce redundant patterns in the sense of logical equivalence.