Job-shop Scheduling Problem (JSP) is one of the most difficult benchmark problems. GA approaches often fail searching the global optimum because of the deception UV-structure of JSPs. In this paper, we introduce a novel framework model of GA, Innately Split Model (ISM) which prevents UV-phenomenon, and discuss on its power particularly. Next we analyze the structure of JSPs with the help of the UV-structure hypothesys, and finally we show ISM's excellent performance on JSP.
There have been several previous studies on measuring the semantic
similarity between words whose concepts are represented as points in a
multi-dimensional vector space acquired from text data such as
electronic dictionaries or text corpora. A central problem in these
studies is how to select orthonormal basis vectors for the space which
represents attributes of the words. We propose a method of building
the space by combining two representative methods, one using singular
value decomposition and the other using the contents of a thesaurus.
The proposed method was evaluated both for the purposes of similar
word retrieval and for document retrieval. The evaluations showed
that the proposed combination is more effective than either of the
original methods alone for both of these tasks.
The purpose of reinforcement learning is to learn an optimal policy in general. However, in 2-players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we focus on formation of a penalty avoiding policy based on the Penalty Avoiding Rational Policy Making algorithm [Miyazaki 01]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. We introduce several ideas and heuristics to overcome the combinational explosion in large-scale problems. First, we propose an algorithm to save the memory by calculation of state transition. Second, we describe how to restrict exploration by two type knowledge; KIFU database and evaluation funcion. We show that our learning player can always defeat against the well-known othello game program KITTY.
In this paper, we propose a new method to deal with continuously varying quantity in the situation calculus based on the concept of the nonstandard analysis. The essential point of the method is to devise a new model called nonstandard situation calculus, which is an interpretation of the situation calculus in the set of hyperreals. This nonstandard model allows discrete but uncountable (hyperfinite) state transition, so that we can describe and reason about the continuous dynamics which are usually treated with differential equations. In this enlarged perspective of the nonstandard situation calculus, we discuss about an infinitesimal action, infinite plan and an abnormality theory for the temporal prediction based on the differentiability.
Knowledge discovery in databases (KDD) has been studied intensively recent years. In KDD, inductive classifier learning methods which were developed in statistics and machine learning have been used to extract classification rules from databases. Although in KDD we have to deal with large databases in many cases, many of the previous classifier learning methods are not suitable for large databases. They were designed under assumption that any data in databases is accessible on demand and they usually need to access a datum several times in a process of learning. So, they require a huge memory space or a large I/O cost to access storage devices. In this paper, we propose a classifier learning method, we call CIDRE, in which data summaries are constructed and classifiers are learned from the summaries. This learning method is realized by using a clustering method, we call MCF-tree, which is an extension of CF-tree proposed by Zhang et al. In the method, we can specify the size of memory space occupied by data summaries, and databases are swept only once to construct the summaries. In addition, new instances can be inserted into the summaries incrementally. Thus, the method possesses important properties which are desirable to deal with large databases. We also show empirical results, which indicate that our method performs very well in comparison to C4.5 and naive Bayes, and the extension from CF-tree to MCF-tree is indispensable to achieve high classification accuracy.
The goal of our research is to develop a software environment in which users can select appropriate Knowledge Discovery systems from various candidates, and compare each obtained result with others. This paper reports a prototype of such an environment, named EVLD (Environment for Various Logics of Discovery). The key idea in constructing EVLD is introducing three-level architecture into it. We adopt the logic of discovery proposed by Plotkin for the middle level, and put every Knowledge Discovery system on the lowest level.
By designing the transformation of the two levels, users of EVLD obtain outputs of every discovery system in the form of logic programs. They can compare the outputs by comparing the logic programs. Moreover the logic programs are stored in a rule base, and can be used as a background theory for another trial of discovery systems.
Though the logic of discovery was invented thirty years ago with philosophical consideration of discovery, it includes Knowledge Discovery from relational databases in first-order logic. Therefore constructing EVLD is connecting recent Knowledge Discovery systems with Plotkin's philosophical consideration.
Recently, ontology is expected to contribute to sharing and reuse of knowledge, and a lot of research on ontology has been carried out. However, useful methodologies for building ontologies are not yet established in spite of the understanding of its necessity. The crucial point of building ontologies is to articulate concepts in the target field. If a concept is defined in a manner depending on a particular context, it is difficult to reuse it in other contexts. Therefore, it is important to distinguish context-dependent concepts from context-independent ones. The difficulty of the discrimination requires development of a methodology and a guide system for building ontologies.
This paper proposes AFM (Activity-First Method), a methodology of building domain ontology depending on task analysis, and an ontology building guide system based on AFM. The guide system supports building ontologies by providing an ontology author with building steps, and managing intermediate results. In the first half of this paper, we take an ontology of oil refinery plant operation task as an example and discuss articulation of the role concepts. Role concepts represent conceptual categories of roles of matters in a particular context. This consideration serves as the foundation of this guide system. In the second half of this paper, we show how the guide system supports an ontology building process.
Several researchers have indicated that to pose arithmetical word problems is an important way to learn arithmetic. However, learning by problem posing is not popular in the lecture of classroom because it is hard for a teacher to do it. For example, even to judge whether each problem is correct or not, a teacher has to examine each problem.
In this paper, we describe an Intelligent Learning Environment which realizes the learning by problem posing. In the learning by problem posing, a learner poses problems through the interface provided by the ILE. The ILE has a function to diagnose the problems posed by the learner. By using the results of the diagnosis, the ILE helps the learner to correct wrong problems, or leads her/him in the next step of problem posing. We evaluated the ILE in an elementary school. The subjects were 4th grade students. We also report the results of the evaluation.
In the ILE, the interface was implemented in Java, and the diagnosis module was implemented in Prolog. So it can be used on World Wide Web. The current environment deals with simple arithmetical word problems solved by an addition or a subtraction.
With the rapidly increasing number of proteins of which three-dimensional (3D) structures are known, the protein structure database is one of the key elements in many attempts being made to derive the knowledge of structure-function relationships of proteins. In this work, the authors have developed a software tool to assist in constructing the 3D protein motif dictionary that is closely related to the PROSITE sequence motif database. In the PROSITE, a structural feature called motif is described by a sequence pattern of amino acid residues with the regular expression defined in the database. The present system allows us to automatically find the related sites for all the 3D protein structures taken from a protein structure database such as the Protein Data Bank (PDB), and to make a dictionary of the 3D motifs related to the PROSITE sequence motif patterns. A computational trial was carried out for a subset of the PDB's structure data file. The structural feature analysis resulted with the tool showed that there are many different 3D motif patterns but having a particular PROSITE sequence pattern. For this reason, the authors also tried to classify the 3D motif patterns into several groups on the basis of distance similarity matrix, and to determine a representative pattern for each group in preparing the dictionary. The usefulness of the additional approach for preparing the 3D motif dictionary is also discussed with an illustrative example.
This paper describes an automatic monitoring system that constantly checks partial update in Web pages and notifies them to a user. While one of the most important advantages of the WWW is update of Web pages, we need to constantly check them out and this task takes much cognitive load. Thus applications to automatically check update of Web pages have been developed, however they can not deal with partial update like update in a particular cell in a table in a Web page. Hence we developed a automatic monitoring system that checks such partial update. A user can give a system regions in which he/she wants to know the update in a Web page as training examples, and it is able to learn rules to identify the partial update by classification learning. By this learning, a user do not need to directly describe the rules. We implemented our system and some executed examples were presented.
With the development of computer techniques, active mining which is a combination of active information gathering, user-centered mining and active user reaction has played an important role in the success of mining novel knowledge from data. Active information gathering is the technique which aims at effectively searching relevant information and conducting preprocess required before the user-centered mining process. Even though there are a large number of researches concerning the problem of mining biomedical literature databases, the importance of active information gathering in such kinds of researches has not been mentioned so far. In this paper, we consider the problem of selecting the articles of experts' interest from a literature database by making use of existing databases and machine learning techniques. This problem could be considered as an active information gathering problem and is useful from the viewpoint of active mining prospect. The results show the effectiveness of making use of existing databases in terms of reducing the number of training data required for the learning system while maintaining the quality of the obtained documents.
In this paper, we perform a worst-case analysis of rule discovery based on generality and accuracy. A rule is defined as a probabilistic constraint of true assignment to the class attribute for corresponding examples. In data mining, a rule can be considered as representing an important class of discovered patterns. We accomplish the aforementioned objective by extending a preliminary version of PAC learning, which represents a worst-case analysis for classification. Our analysis consists of two cases: the case in which we try to avoid finding a bad rule, and the case in which we try to avoid overlooking a good rule. Discussions on related work are also provided for PAC learning, multiple comparison, analysis of association rule discovery, and simultaneous reliability evaluation of a discovered rule.
Continuation processes in chemical and/or biotechnical plants always generate a large amount of time series data. However, since conventional process models are described as a set of control models, it is difficult to explain the complicated and active plant behaviors. Based on the background, this research proposes a novel method to develop a process response model from continuous time-series data.
The method consists of the following phases: 1) Collect continuous process data at each tag point in a target plant; 2) Normalize the data in the interval between zero and one; 3) Get the delay time, which maximizes the correlation between given two time series data; 4) Select tags with the higher correlation; 5) Develop a process response model to describe the relations among the process data using the delay time and the correlation values; 6) Develop a process prediction model via several tag points data using a neural network; 1) Discover control rules from the process prediction model using Learning Classifier system.
The main contribution of the research is to establish a method to mine a set of meaningful control rules from Learning Classifier System using the Minimal Description Length criteria. The proposed method has been applied to an actual process of a biochemical plant and has shown the validity and the effectiveness.
Here is presented parallel CAMLET that is a platform for automatic composition of inductive applications with method repositories that organize many inductive learning methods. After having implemented CAMLET on a UNIX parallel machine with Perl and C languages, we have done the case studies of constructing inductive applications for eight different data sets from the StatLog repository. To find out an efficient search method, we have done the following experiments: a random search, a GA based search, and two hybrid searches with unifying each global search and the local search which uses meta-rules for refining a specification. That have shown us that the hybrid search works better than the other search methods. Furthermore, comparing the accuracy of inductive applications composed by parallel CMALET with that of popular twenty four inductive algorithms, we have shown that parallel CAMLET support a user in doing model selection in KDD engineering processes.