The task of inducing grammar structures has received a great deal of attention. The reasons why researchers have studied are different; to use grammar induction as the first stage in building large treebanks or to make up better language models. However, grammar induction has inherent computational complexity. To overcome it, some grammar induction algorithms add new production rules incrementally. They refine the grammar while keeping their computational complexity low. In this paper, we propose a new efficient grammar induction algorithm. Although our algorithm is similar to algorithms which learn a grammar incrementally, our algorithm uses the graphical EM algorithm instead of the Inside-Outside algorithm. We report results of learning experiments in terms of learning speeds. The results show that our algorithm learns a grammar in constant time regardless of the size of the grammar. Since our algorithm decreases syntactic ambiguities in each step, our algorithm reduces required time for learning. This constant-time learning considerably affects learning time for larger grammars. We also reports results of evaluation of criteria to choose nonterminals. Our algorithm refines a grammar based on a nonterminal in each step. Since there can be several criteria to decide which nonterminal is the best, we evaluate them by learning experiments.
Data mining to derive frequent subgraphs from a dataset of general graphs has high computational complexity because it includes the explosively combinatorial search for candidate subgraphs and subgraph isomorphism matching. Although some approaches have been proposed to derive characteristic patterns from graph structured data, they limit the graphs to be searched within a specific class. In this paper, we propose an approach to conduct a complete search of various classes of frequent subgraphs in a massive dataset of labeled graphs within practical time. The power of our approach comes from the algebraic representation of graphs, its associated operations and well-organized bias constraints to limit the search space efficiently. Its performance has been evaluated through real world datasets, and the high scalability of our approach has been confirmed with respect to the amount of data and the computation time.
To apply reinforcement learning to difficult classes such as real-environment learning, we need to use a method robust to perceptual aliasing problem. The exploitation-oriented methods such as Profit Sharing can deal with the perceptual aliasing problem to a certain extent. However, when the agent needs to select different actions at the same sensory input, the learning efficiency worsens. To overcome the problem, several state partition methods using history information of state-action pairs are proposed. These methods try to convert a POMDP environment into an MDP environment, and thus they are sometimes very useful. However, their computation cost is very high especially in large state spaces. In contrast, memory-less approaches try to escape from the aliased states by outputting actions stochastically. However, these methods output actions stochastically even in unaliased states, and thus the learning efficiency is bad. If we desire to guarantee the rationality in POMDPs, it is efficient to output actions stochastically only in the aliased states and to output one action deterministically in the other unaliased states. Hence, to discriminate between aliased states and unaliased states, the utilization of χ² -goodness-of-fit test is proposed by Miyazaki et al. They point out that, in aliased states, the distributions of the state transitions by random search and a particular policy are different. This difference doesn't occur owing to non-deterministic actions. Hence, if the agent can collect enough samples to implement the test, the agent can distinguish between aliased states and unaliased states well. However, such a test needs a large amount of data, and it's a problem how the agent collects samples without worsening learning efficiency. If the agent uses random search in the course of learning, the learning efficiency worsens especially in unaliased states. Therefore, in this research, we propose a new method called Extended On-line Profit Sharing with Judgement (EOPSwJ) to detect important incomplete perception, which doesn't need large computation cost and numerous samples. We use two criterions for detecting important incomplete perceptions to attain a task. One is the rate of transitions to each state, and the other is the deterministic rate of actions. We confirm the availability of EOPSwJ using two simulations.
In this paper, we discuss importance and utilization of personal network in a community system through the result of management and analysis of the scheduling support system for academic conferences. The important feature of the system is generation and utilization of personal network to support information exchanging and information discovery among participants. We applied this system to the academic conference called JSAI2003. We obtained 276 users and their personal networks. We found that (1) most participants were willing to contribute to form personal networks, (2) personal networks can promote information exchanging among participants since personal network showed existence of participants to the others and (3) the formed networks can was useful for them in information recommendation.
In representing classification rules by decision trees, simplicity of tree structure is as important as predictive accuracy especially in consideration of the comprehensibility to a human, the memory capacity and the time required to classify. Trees tend to be complex when they get high accuracy. This paper proposes a novel method for generating accurate and simple decision trees based on symbiotic evolution. It is distinctive of symbiotic evolution that two different populations are evolved in parallel through genetic algorithms. In our method one's individuals are partial trees of height 1, and the other's individuals are whole trees represented by the combinations of the former individuals. Generally, overfitting to training examples prevents getting high predictive accuracy. In order to circumvent this difficulty, individuals are evaluated with not only the accuracy in training examples but also the correct answer biased rate indicating the dispersion of the correct answers in the terminal nodes. Based on our method we developed a system called SESAT for generating decision trees. Our experimental results show that SESAT compares favorably with other systems on several datasets in the UCI repository. SESAT has the ability to generate more simple trees than C5.0 without sacrificing predictive accuracy.
Estimation of Distribution Algorithms, which employ probabilistic models to generate the next population, are new promising methods in the field of genetic and evolutionary algorithms. In the case of conventional Genetic and Evolutionary Algorithms are applied to Constraint Satisfaction Problems, it is well-known that the incorporation of the domain knowledge in the Constraint Satisfaction Problems is quite effective. In this paper, we constitute a memetic algorithm as a combination of the Estimation of Distribution Algorithm and a repair method. Experimental results on general Constraint Satisfaction Problems tell us the effectiveness of the proposed method.
In this paper, we describe the completeness of a calculation procedure of logic programs.
The procedure is the combination of two procedures,
a replacement procedure of atoms in the goal by the bodies
or the negation of the bodies of rules in the program,
and a transformation procedure of equations to disjunctive
normal forms (DNF) equivalent under Clark's Equational
To combine replacement of atoms in the goal to logical
formulae determined from the program and transformation of
equations to DNF equivalent under CET is a method by which
procedures with the capability of expressing answers in DNF
can be build,
so it is a leading method for expressing answers in a form
Some procedures based on the method are devised,
and their calculation capabilities are shown by applying the
theory of completed programs.
the procedure that uses the bodies or the negation of the
bodies of rules for replacement has higher calculation
and is intuitively more natural than they.
Therefore, to clarify the calculation capability of the
procedure is considered an important subject for research
into calculation procedures of logic programs with the
capability for expressing answers in a form including
since the completeness is realized by standing on the
viewpoint of treating the implication symbol as a different
implication symbol from usual,
and interpreting logic programs in three-valued logic,
examples which support the viewpoint are also described.
This paper proposes an agent community based information retrieval method, which uses agent communities to manage and look up information related to users. An agent works as a delegate of its user and searches for information that the user wants by communicating with other agents. The communication between agents is carried out in a peer-to-peer computing architecture. In order to retrieve information related to a user query, an agent uses two histories : a query/retrieved document history(Q/RDH) and a query/sender agent history(Q/SAH). The former is a list of pairs of a query and retrieved documents, where the queries were sent by the agent itself. The latter is a list of pairs of a query and sender agents and shows ``who sent what query to the agent''. This is useful to find a new information source.
Making use of the Q/SAH is expected to cause a collaborative filtering effect, which gradually creates virtual agent communities, where agents with the same interests stay together. Our hypothesis is that a virtual agent community reduces communication loads to perform a search. As an agent receives more queries, then more links to new knowledge are achieved. From this behavior, a ``give and take''(or positive feedback) effect for agents seems to emerge.
We implemented this method with Multi-Agents Kodama which has been developed in our laboratory, and conducted preliminary experiments to test the hypothesis. The empirical results showed that the method was much more efficient than a naive method employing 'broadcast' techniques only to look up a target agent.
In this paper, we describe a generic image classification system with an automatic knowledge acquisition mechanism from the World Wide Web. Due to the recent spread of digital imaging devices, the demand for generic image classification/recognition of various kinds of real world scenes becomes greater. To realize it, visual knowledge on various kinds of scenes is required, and we have to prepare a large number of learning images. Therefore, commercial image collections such as Corel Image Gallery are widely used in conventional studies on generic image classification. However, they are not suitable as learning images for generic image classification, since they do not include various kinds of images. Then, in stead of commercial image collections we propose utilizing of a large number of images gathered from the World-Wide Web by a Web image-gathering system as learning images. Images on the Web have huge diversity in general. So that we take advantage of the diversity of Web images for a generic image classification task, which is the first attempt among this kinds of work. By the experiments, we show that utilizing of Web images as learning images is effective and promising for generic image classification.
In this paper, we propose a Future View system that assists user's usual Web browsing. The Future View will prefetch Web pages based on user's browsing strategies and present them to a user in order to assist Web browsing. To learn user's browsing strategy, the Future View uses two types of learning classifier systems: a content-based classifier system for contents change patterns and an action-based classifier system for user's action patterns. The results of learning is applied to crawling by Web robots, and the gathered Web pages are presented to a user through a Web browser interface. We experimentally show effectiveness of navigation using the Future View.