Expectation propagation (EP) extends belief propagation in that EP allows users to arbitrarily choose statistics that are used in its approximation scheme. In this paper, we discuss the problem of how a choice of statistics affects performance of EP, using information geometry. Second-order perturbation is used to analyze estimation error of EP. We compare estimation errors of two instances of EP formulation, with one instance including, and therefore being an extension to, the other. Our analytical and numerical results show that choosing more statistics for inclusion in the EP formulation does not necessarily yield smaller estimation errors.
We propose a new boosting algorithm for bipartite ranking problems. Our boosting algorithm, called SoftRankBoost, is a modification of RankBoost which maintains only smooth distributions over data. SoftRankBoost provably achieves approximately the maximum soft margin over all pairs of positive and negative examples, which implies high AUC score for future data.
Forecasting stock price in level is very difficult because such a series is often described as an integrated process. In spite of that, a linear combination of two stock prices can be stationary around a fixed mean. Then taking both long and short position can lead to profit. This article aims to characterize the long-short strategies based on cointegration by investigating their risk-return properties. We propose a framework of long-short trading system based on cointegration analysis, and define two trading strategies, a contrarian type and a momentum type strategy. We consider some restrictions on investment choices by sector and by firm size. The aim of this article is to investigate how such restrictions affect the risk-return properties of the hypothetical long-short funds.
The purpose of this paper is to discuss the discrete probability distribution function induced from the modified bucket sorting. The distribution is related to the unique sequence called "Eulerian Numbers" proposed by Euler (1755). There are sensitive differences between a derivation process of our proposal and Eulerian numbers expressed as a number of permutation runs. Recurrence relation for the moments of the distribution is also given. The mean, variance and higher order cumulants are derived using the relation. The interesting results that the relationship among the proposed, normal and uniform distributions are shown by asymptotic expansion. The efficiency of the proposed method is illustrated through some numerical experiments.
Statistical modeling based on vector autoregressive model has been considered as a promising tool to reconstruct large-scale gene networks from time course microarray data. However, it remains a challenging problem due to the small sample size and the high-dimensionality of time course microarray data. We present a novel regression-based modeling strategy with a new class of regularization, called recursive elastic net. Numerical simulations and real data analysis show that the proposed method outperforms other traditional methods.
This paper proposes a sequential pattern mining algorithm by search position indexing. This algorithm uses two indexes to skip a redundant sequence data scan in mining processes. In an experimental evaluation, it slightly outperforms popular sequential pattern mining algorithm SPAM and PrefixSpan.
We develop nonparametric Bayesian model for extracting relationships between words taken from a corpus. It is aimed at discovering semantic knowledge about words in particular domains. The subject-predicate structure is taken as a syntactic structure with the noun as the subject and the verb as the predicate. This structure is regarded as a graph structure. The generation of this graph can be modeled by using the hierarchical Dirichlet process and the Pitman-Yor process. Evaluation of this model by measuring the performance of graph clustering based on WordNet similarities demonstrated that it outperforms other baseline models.
In recent years, the mining of a complete set of frequent subgraphs from labeled graph data has been extensively studied. However, to our best knowledge, almost no methods have been proposed to find frequent subsequences of graphs from a set of graph sequences. In this paper, we define a novel class of graph subsequences by introducing axiomatic rules of graph transformation, their admissibility constraints and a union graph. Then we propose an efficient approach named "GTRACE" to enumerate frequent transformation subsequences (FTSs) of graphs from a given set of graph sequences. Its fundamental performance has been evaluated by using artificial datasets, and its practicality has been confirmed through the experiments using real world datasets.