Bayesian networks (BNs) have become increasingly popular tools for learning and prediction in the context of user modeling. However, due to the lacks of individual training data especially for inactive or new users, separate treatments of individual users often become quite problematic. In this article, we propose a novel scheme of ‘collaborative’ prediction, in which multiple BNs (each for a single user) are cooperatively employed for prediction of a single target user's actions. The core idea is to simply use the Bayesian principle, but in a rather intricate way. Our method is simple, and thus readily applicable to rather difficult cases with heterogeneous and preliminary unknown structures of individual BNs. We demonstrate the usefulness of our method with a real dataset related to our motivating user modeling task, i.e., printer usage prediction. The results show that our Bayesian collaboration method effectively improves the prediction accuracy from both non-collaborative and other naive collaborative approaches, especially in cases with small sample sizes. It also appears that our method can improve such a low prediction accuracy that is possibly caused by poor optimization in the structure learning.
Many studies on learning Bayesian networks have used the Dirichlet prior score metric (DPSM). Although they assume different optimum hyper-parameter values for DPSM, few studies have focused on selection of optimum hyper-parameter values. Analyses of DPSM hyper-parameters for learning Bayesian networks are presented here along with the following results: 1. DPSM has a strong consistency for any hyper-parameter values. That is, the score metric DPSM, uniform prior score metric (UPSM), likelihood-equivalence Bayesian Dirichlet score metric (BDe), and minimum description length (MDL) asymptotically converge to the same results. 2. The optimal hyper-parameter values are affected by the true network structure and the number of data. 3. Contrary to Yang and Chang (2002)'s results, BDe based on likelihood equivalence is a theoretically and actually reasonable score metric, if the optimum hyper-parameter values can be found. Using these results, this paper proposes a new learning Bayesian network method based on BDeu that uses the empirical Bayesian approach. The unique features of this method are: 1. It is able to reflect a user's prior knowledge. 2. It has both the strong consistency and likelihood equivalence properties. 3. It finds the optimum hyper-parameter value of BDeu to maximize predictive efficiency, by adapting to domain and data size. In addition, this paper presents some numerical examples using the proposed method that demonstrate the effectiveness of the proposed method.
This paper proposes a collaborative filtering method for massive datasets that is based on Bayesian networks. We first compare the prediction accuracy of four scoring-based learning Bayesian networks algorithms (AIC, MDL, UPSM, and BDeu) and two conditional-independence-based (CI-based) learning Bayesian networks algorithms (MWST, and Polytree-MWST) using actual massive datasets. The results show that (1) for large networks, the scoring-based algorithms have lower prediction accuracy than the CI-based algorithms and (2) when the scoring-based algorithms use a greedy search to learn a large network, algorithms which make a lot of arcs tend to have less prediction accuracy than those that make fewer arcs. Next, we propose a learning algorithm based on MWST for collaborative filtering of massive datasets. The proposed algorithm employs a traditional data mining technique, the “a priori” algorithm, to quickly calculate the amount of mutual information, which is needed in MWST, from massive datasets. We compare the original MWST algorithm and the proposed algorithm on actual data, and the comparison shows the effectiveness of the proposed algorithm.
This paper describes a Bayesian network model for a candidate assessment design that had four proficiency variables and 48 tasks with 3-12 observable outcome variables per task and scale anchors to identify the location of the subscales. The domain experts' view of the relationship among proficiencies and tasks established a complex prior distribution over 585 parameters. Markov Chain Monte Carlo (MCMC) estimation recovered the parameters of data simulated from the expert model. The sample size and the strength of the prior had only a modest effect on parameter recovery, but did affect the standard error of estimated parameters. Finally, an identifiability issue involving relabeling of proficiency states and permutations of the matrixes is addressed in the context of this study.