Multi-task learning aims at transferring knowledge between similar tasks. The multi-task Gaussian process framework of Bonilla et al. models (incomplete) responses of C data points for R tasks (e.g., the responses are given by R × C matrix) by a Gaussian process; the covariance function is defined as the product of a covariance function on input-dependent features and the inter-task covariance matrix (which is empirically estimated as a model parameter). We extend this framework by incorporating a novel similarity measurement, which allows for the representation of much more complex data structures. The proposed framework also enables us to exploit additional information (e.g., the input-dependent features) by constructing the covariance matrices with combining them on the covariance function. We also derive an efficient learning algorithm to make prediction by using an iterative method. Finally, we apply our model to a real data set of recommender systems and show that the proposed method achieves the best prediction accuracy on the data set.
Nodes are often categorized using some centrality or similarity measures in order to analyze the structure of complex networks. Sometimes a community structure is used for the node categorization. However, there are few studies that the nodes are categorized based on multiple characteristic properties which can be defined at each node such as the degree, local clustering coefficient, local geodesic distance, etc. In this study, we propose a new categorization method for nodes in complex networks. First, we calculate several local characteristic properties at each node, and define the attribute vector of the node which each component corresponds to such properties. Second, the nodes are categorized by clustering multivariate data, i.e. the attribute vectors. SOM-based simple clustering method is used in this paper. Finally, one example is demonstrated to show how the proposed method works well. We also show the effectiveness of our categorization method to analysis of simulations on networks.
Solid oxide fuel cell (SOFC) is an efficient generator and researched for practical use. However, one of the problems is the durability. In this study, we research the mechanical correlations among components of SOFC by analyzing the co-occurrence of acoustic emission (AE) events which are caused by damage. Then we proposed a novel method for mining patterns from the numerical data such as AE. The conventional method has possible problems when mining patterns from the numerical data. In the clustering, clusters may contain data which does not contribute a certain pattern, or may not contain data which contribute a pattern. On the other hand, the proposed method extracts patterns of two clusters considering co-occurrence between clusters and similarity within each cluster at the same time. In addition, the dendrogram obtained from hierarchical clustering is utilized for the reduction of search space. First, we evaluate the performance of proposed method with artificial data, and demonstrate that we can obtain appropriate clusters corresponding to patterns. Then, we apply the proposed method to AE data, and the damage patterns which represent the major mechanical correlations were extracted. We can acquire novel knowledge about damage mechanism of SOFC from the results.
It has attracted considerable attention to use crowdsourcing services to collect a large amount of labeled data for machine learning, since crowdsourcing services allow one to ask the general public to label data at very low cost through the Internet. The use of crowdsourcing has introduced a new challenge in machine learning, that is, coping with low quality of crowd-generated data. There have been many recent attempts to address the quality problem of multiple labelers, however, there are two serious drawbacks in the existing approaches, that are, (i) non-convexity and (ii) task homogeneity. Most of the existing methods consider true labels as latent variables, which results in non-convex optimization problems. Also, the existing models assume only single homogeneous tasks, while in realistic situations, clients can offer multiple tasks to crowds and crowd workers can work on different tasks in parallel. In this paper, we propose a convex optimization formulation of learning from crowds by introducing personal models of individual crowds without estimating true labels. We further extend the proposed model to multi-task learning based on the resemblance between the proposed formulation and that for an existing multi-task learning model. We also devise efficient iterative methods for solving the convex optimization problems by exploiting conditional independence structures in multiple classifiers.
The financial markets are fluctuating consistently. Therefore, it is difficult to analyze the financial market based on the same theory, without depending on the state of the market. So we use the concept ofmarket condition change. To estimate the points when the market change occurred in a real market is effective for market analysis. Thus, in this paper, we propose a method to detect the changes in market conditions. In the proposed method, we focuse on the stock board instead of the price data. From the stock board data, we classify short time series data to clusters by using k-means clustering method. Then, we generate Hidden Markov Model(HMM) from the transition probability of each clusters. By using the likelihood of HMM, we analyze the similarities of each time series data. We performed an experiment to evaluate the effectiveness of the method by discriminant analysis of time series data which created from opening session and continuous session. As a result, two time series data are discriminated with high accuracy. Finally, we compared the discriminate performance of proposed method with another discriminant analysis methods. We used three types of time series data of stock board and price data, before the Lehman's fall financial crisis. From the result, the proposed method shows the best performance in discriminating each financial data.
The purpose of this study is to develop efficient methods for the minimum-consistent DFA (deterministic finite state automaton) problem. The graph-coloring based SAT (satisfiability) approach proposed by Heule is a state of the art method for this problem. It specially achieves high performance computing in dense problems such as in a popular benchmark problem where rich information about labels is included. In contrast, to solve sparse problems is a challenge for the minimum-consistent DFA problem. To solve sparse problems, we propose three approaches to the SAT formulation: a) the binary color representation, b) the dynamic symmetry breaking and c) the hyper-graph coloring constraint. We organized an experiment using the existing benchmark problems and sparse problems made from them. We observed that our symmetry breaking constraints made the speed up the running time of SAT solver. In addition with this, our other proposed methods were showing the possibility to improve the performance. Then we simulated the perfomance of our methods under the condition that we executed the several program set-ups in parallel. Compared with the previous research results, we finally could reduce the average relative time by 66.5% and the total relative time by 7.6% for sparse problems and by 79.7% and 38.5% for dense problems, respectively. These results showed that our proposed methods were effective for difficult problems.
Various methods for music retrieval have been proposed. Recently, many researchers are tackling developing methods based on the relationship between music and feelings. In our previous psychological study, we found that there was a significant correlation between colors evoked from songs and colors evoked only from lyrics, and showed that the music retrieval system using lyrics could be developed. In this paper, we focus on the relationship among music, lyrics and colors, and propose a music retrieval method using colors as queries and analyzing lyrics. This method estimates colors evoked from songs by analyzing lyrics of the songs. On the first step of our method, words associated with colors are extracted from lyrics. We assumed two types of methods to extract words associated with colors. In the one of two methods, the words are extracted based on the result of a psychological experiment. In the other method, in addition to the words extracted based on the result of the psychological experiment, the words from corpora for the Latent Semantic Analysis are extracted. On the second step, colors evoked from the extracted words are compounded, and the compounded colors are regarded as those evoked from the song. On the last step, colors as queries are compared with colors estimated from lyrics, and the list of songs is presented based on similarities. We evaluated the two methods described above and found that the method based on the psychological experiment and corpora performed better than the method only based on the psychological experiment. As a result, we showed that the method using colors as queries and analyzing lyrics is effective for music retrieval.
Although many definitions of services have been proposed in Service Science and Service Engineering, essentialities of the notion of ``service" remain unclear. Especially, some existing definitions of service are similar to the definition of function of artifacts, and there is no clear distinction between them. Thus, aiming at an ontological conceptualization of service, we have made an ontological investigation into the distinction between service and artifact function. In this article, we reveal essential properties of service and propose a model and a definition of service. Firstly, we extract 42 properties of service from 15 articles in different disciplines in order to find out fundamental concepts of service. Then we show that the notion of function shares the extracted foundational concepts of service and thus point out the necessity of the distinction between them. Secondly, we propose a multi-layered model of services, which is based on the conceptualization of goal-oriented effects at the base-level and at the upper-level. Thirdly, based on the model, we clarify essential properties of service which can distinguish artifact function. The conceptualization of upper-effects (upper-service) enables us to show that upper-services include various effects such as sales and manufacturing. Lastly, we propose a definition of the notion of service based on the essential properties and show its validity using some examples.
This paper proposes a constrained clustering method based on an iterative data division by a constrained graph cutting approach. Since our proposed constrained graph cut problem is formalized by semidefinite programming, must-link constraints can be naturally imported to the problem. Though the solution matrix obtained by solving the problem does not directly reflect cluster members, we introduce an efficient heuristic algorithm to produce clusters without any complex procedures such as matrix decomposition. In the experiments, we compared our method with other state-of-the art and well-known clustering techniques on the UCI repository and CLUTO datasets. Our method showed outperformed or comparable results compared with other traditional ones on more than half of the datasets. We also compared the calculation cost. Though our method tends to consume more time to calculate, the total cost is at most double compared with other SDP based method.
In systems biology, identifying vital functions like glycolysis from a given metabolic pathway is important to understand living organisms. In this paper, we particularly focus on the problem of enumerating minimal active pathways producing target metabolites from source metabolites. We represent the problem in propositional formulas and solve it through minimal model generation. An advantage of our method is that each solution satisfies qualitative laws of biochemical reactions. Moreover, we can calculate such solutions for a cellular scale metabolic pathway within a few seconds. In experiments, we have applied our method to a whole Escherichia coli metabolic pathway. As a result, we found a minimal set of reactions corresponding to the conventional glycolysis pathway described in a biological database EcoCyc.
A collaborate filtering has been generally used as a method which recommends items to customers. However, recommending academic books, it need to consider difficulty of them and individual amount of knowledge as well as user's preference. If the recommendation method considers only user's preference, they might regret after buying or reading recommended book because it won't match user's appropriate level. In this paper, we focus on academic books and propose a method which estimates the difficulty of academic books using user's reviews. Estimating difficulty of books will support users to search and recommend academic books that match user's skill. Moreover, we evaluated applying our method to academic text books about C programming Language. We verified that our method is more effective than traditional methods for academic books.
This paper proposes a framework to predict future significance or importance of nodes of a network through link prediction. The network can be of any kind, such as a co-authorship network where nodes are authors and co-authors are linked by edges. In this example, predicting significant nodes means to discover influential authors in the future. There are existing approaches to predicting such significant nodes in a future network and they typically rely on existing relationships between nodes. However, since such relationships are dynamic and would naturally change over time (e.g., new co-authorship continues to emerge), approaches based only on the current status of the network would have limited potentiality to predict the future. In contrast, our proposed approach first predicts future links between nodes by multiple supervised classifiers and applies the RankBoost algorithm for combining the predictions such that the links would lead to more precise predictions of a centrality (significance) measure of our choice. To demonstrate the effectiveness of our proposed approach, a series of experiments are carried out on the arXiv (HEP-Th) citation data set.
Is-a hierarchies form the foundation of ontologies. That is, is-a hierarchies in an ontology reflect how the ontology captures the essential conceptual structure of the target world. Therefore, in ontological theories, an is-a hierarchy should be single-inheritance because the essential property of things cannot exist in multiple. However, we cannot avoid multi-perspective issues when we build an ontology because the user often want to understand things from their own viewpoints. In order to tackle this multi-perspective issue, the authors take an approach of dynamically generating is-a hierarchies according to the viewpoints of users from an ontology using single-inheritance. This article discusses a framework for dynamic is-a hierarchy generation with ontological consideration on is-a hierarchies generated by it. Then, the author shows its implementation as a new function of Hozo and its applications to a medical ontology for dynamically generation of is-a hierarchies of disease. Through the function, users can understand an ontology from a variety of viewpoints. As a result, it could contribute to comprehensive understanding of the ontology and its target world.