Peculiarity oriented mining (POM) is different from, and complementary to, existing approaches for discovering new, surprising and interesting patterns hidden in data. A main task of mining peculiarity rules is peculiarity identification. Previous methods for finding peculiar data are attribute-based approaches. This paper extends peculiarity oriented mining to relational peculiarity oriented mining (RPOM) in which peculiar data are identified on the record level. The experimental results in image sequences of tracking multiple walking people show that the proposed RPOM approach is effective.
We propose a novel probabilistic image selection method for the Web image gathering system we proposed before. It employed two-step processing: (1) Gather HTML files of Web pages related to given keywords, analyze them and fetch only Web images expected to be highly related to the keywords. (2) Select only relevant images from the gathered images based on the image-feature-based clustering. In this paper, we propose building a generative model based on the Gaussian mixture model to represent the distribution of image features of images related to the given keywords, and applying it to select images instead of the processing (2). We call the new system ``Probabilistic Image Collector''. We show the effectiveness of our proposed system by the experimental results.
Motivation is one of the most important factors in learning. Many researchers of learning environments, therefore, pay special attention to learning games as a remarkable approach to realize highly motivated learning. However, to make a learning game is not easy task. Although there are several investigations for design methods of learning games, most of them are only proposals of guidelines for the design or characteristics that learning games should have. Therefore, developers of learning games are required to have enough knowledge and experiences regarding learning and games in order to understand the guidelines or to deal with the characteristics. Then, it is very difficult for teachers to obtain learning games fitting for their learning issues.
In this paper, we propose the cAS, a new ACO algorithm, and evaluate the performance using TSP instances available at TSPLIB. The results show that cAS works well on the test instances and has performance that may be one of the most promising ACO algorithms. We also evaluate cAS when it is combined with LK local search heuristic using larger sized TSP instances. The results also show promising performance. cAS introduced two important schemes. One is to use the colony model divided into units, which has a stronger exploitation feature while maintaining a certain degree of diversity among units. The other is to use a scheme, we call cunning, when constructing new solutions, which can prevent premature stagnation by reducing strong positive feedback to the trail density.
Genetic Programming (GP) is a powerful optimization algorithm, which employs the crossover for genetic operation. Because the crossover operator in GP randomly selects sub-trees, the building blocks may be destroyed by the crossover. Recently, algorithms called PMBGPs (Probabilistic Model Building GP) based on probabilistic techniques have been proposed in order to improve the problem mentioned above. We propose a new PMBGP employing Bayesian network for generating new individuals with a special chromosome called expanded parse tree, which much reduces a number of possible symbols at each node. Although the large number of symbols gives rise to the large conditional probability table and requires a lot of samples to estimate the interactions among nodes, a use of the expanded parse tree overcomes these problems. Computational experiments on two subjects demonstrate that our new PMBGP is much superior to prior probabilistic models.
Social relation plays an important role in a real community. Interaction patterns reveal relations among actors (such as persons, groups, companies), which can be merged into valuable information as a network structure. In this paper, we propose a new approach to extract inter-business relationship from the Web. Extraction of relation between a pair of companies is realized by using a search engine and text processing. Since names of companies co-appear coincidentaly on the Web, we propose an advanced algorithm which is characterized by addition of keywords (or we call relation words) to a query. The relation words are obtained from either an annotated corpus or the Web. We show some examples and comprehensive evaluations on our approach.
Through numbers of studies on the formation of equivalence relations and causal induction, it is known that human beings tend to consider conditional statements ``if p then q" as biconditional statements ``if and only if p then q": we call the tendency to perceive ``if p then q" as ``if q then p" the ``symmetry bias". On the other hand, many studies on children's word learning have pointed out that children tend to expect each object has only one label. This is so-called the ``mutual exclusivity bias". This bias implies
Linear-chain conditional random fields are a state-of-the-art machine learner for sequential labeling tasks. Altun investigated various loss functions for linear-chain conditional random fields. Tsuboi introduced smoothing method between point-wise loss function and sequential loss function. Sarawagi proposed semi-markov conditional random fields in which variable length of observed tokens are regarded as one node in lattice function. We propose a smoothing method among several loss functions for semi-markov conditional random fields. We draw a comparison among the loss functions and smoothing rate settings in base phrase chunking and named entity recognition tasks.
In conceptual design, it is important to develop functional structures which reflect the rich experience in the knowledge from previous design failures. Especially, if a designer learns possible abnormal behaviors from a previous design failure, he or she can add an additional function which prevents such abnormal behaviors and faults. To do this, it is a crucial issue to share such knowledge about possible faulty phenomena and how to cope with them. In fact, a part of such knowledge is described in FMEA (Failure Mode and Effect Analysis) sheets, function structure models for systematic design and fault trees for FTA (Fault Tree Analysis).
To overcome the limitation of conventional text-mining approaches in which frequent patterns of word occurrences are to be extracted to understand obvious user needs, this paper proposes an approach to extracting questions behind messages to understand potential user needs. We first extract characteristic case frames by comparing the case frames constructed from target messages with the ones from 25M sentences in the Web and 20M sentences in newspaper articles of 20 years. Then we extract questions behind messages by transforming the characteristic case frames into interrogative sentences based on new information and old information, i.e., replacing new information with WH-question words. The proposed approach is, in other words, a kind of classification of word occurrence pattern. Qualitative evaluations of our preliminary experiments suggest that extracted questions show problem consciousness and alternative solutions -- all of which help to understand potential user needs.