Transactions of the Japanese Society for Artificial Intelligence

Regular

Technical Papers

On Autonomous Coordination of Learning Biases by an Agent with a Vocabulary Learning Mechanism

Shuji Shinohara, Ryo Taguchi, Takashi Hashimoto, Kouichi Katsurada, Ts ...

2007 Volume 22 Issue 2 Pages 103-114
Published: 2007
Released on J-STAGE: January 17, 2007

DOIhttps://doi.org/10.1527/tjsai.22.103

JOURNAL FREE ACCESS

Show abstractHide abstract

Many studies have reported learning biases in children's learning of object name,

View full abstract

Download PDF (482K)
A Supervised Automatic Evaluation for Summarization with a Voted Regression Model

Tsutomu Hirao, Manabu Okumura, Norihito Yasuda, Hideki Isozaki

2007 Volume 22 Issue 2 Pages 115-126
Published: 2007
Released on J-STAGE: January 17, 2007

DOIhttps://doi.org/10.1527/tjsai.22.115

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to improve automatic summarization systems,

View full abstract

Download PDF (1642K)
Applying Genetic Programming with Substructure Discovery to a Traffic Signal Control Problem

Juncichi Kumagai, Yasuo Ojima, Souichi Takashige, Yoshitaka Kameya, Ta ...

2007 Volume 22 Issue 2 Pages 127-139
Published: 2007
Released on J-STAGE: January 17, 2007

DOIhttps://doi.org/10.1527/tjsai.22.127

JOURNAL FREE ACCESS

Show abstractHide abstract

Nowadays the increase of traffic causes numerous serious traffic jams, and traffic signals are desired to work adaptively for dynamic traffic flows. In this paper, we view such a problem of traffic signal control as a multi-agent problem where each signal has a controlling agent, and aim to make the agents work cooperatively depending on the traffic status. To build such an agent program automatically, we introduce genetic programming (GP), an evolutionary method for program construction. In GP, it is known as important to encapsulate the substructures of a program which leads to higher fitness to the environment, and we propose a new encapsulation method using an efficient technique for discovering frequent substructures, which has been recently proposed in the data mining field. We also conducted a simulation with a real traffic data, and confirmed that GP with our encapsulation method outperforms the normal GP. It is also observed that the best individual has a communication part that chooses an appropriate communication area and adapts to the traffic status.

View full abstract

Download PDF (415K)

Special Issue: Data Mining and Statistical Science

Technical Papers

A Spectrum Tree Kernel

Tetsuji Kuboyama, Kouichi Hirata, Hisashi Kashima, Kiyoko F.Aoki-Kinos ...

2007 Volume 22 Issue 2 Pages 140-147
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.140

JOURNAL FREE ACCESS

Show abstractHide abstract

Learning from tree-structured data has received increasing interest with the rapid growth of tree-encodable data in the World Wide Web, in biology, and in other areas. Our kernel function measures the similarity between two trees by counting the number of shared sub-patterns called tree q-grams, and runs, in effect, in linear time with respect to the number of tree nodes. We apply our kernel function with a support vector machine (SVM) to classify biological data, the glycans of several blood components. The experimental results show that our kernel function performs as well as one exclusively tailored to glycan properties.

View full abstract

Download PDF (323K)
Efficient Variable Selection Method for Exposure Variables on Binary Data

Manabu Ohno, Tomoyuki Tarumi

2007 Volume 22 Issue 2 Pages 148-155
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.148

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.

View full abstract

Download PDF (351K)
Symmetric Item Set Mining Method Using Zero-suppressed BDDs and Application to Biological Data

Shin-ichi Minato, Kimihito Ito

2007 Volume 22 Issue 2 Pages 156-164
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.156

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present a method of finding symmetric items in a combinatorial item set database. The techniques for finding symmetric variables in Boolean functions have been studied for long time in the area of VLSI logic design, and the BDD (Binary Decision Diagram) -based methods are presented to solve such a problem. Recently, we have developed an efficient method for handling databases using ZBDDs (Zero-suppressed BDDs), a particular type of BDDs. In our ZBDD-based data structure, the symmetric item sets can be found efficiently as well as for Boolean functions. We implemented the program of symmetric item set mining, and applied it to actual biological data on the amino acid sequences of influenza viruses. We found a number of symmetric items from the database, some of which indicate interesting relationships in the amino acid mutation patterns. The result shows that our method is helpful for extracting hidden interesting information in real-life databases.

View full abstract

Download PDF (566K)
Frequent Closed Item Set Mining Based on Zero-suppressed BDDs

Shin-ichi Minato, Hiroki Arimura

2007 Volume 22 Issue 2 Pages 165-172
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.165

JOURNAL FREE ACCESS

Show abstractHide abstract

Frequent item set mining is one of the fundamental techniques for knowledge discovery and data mining. In the last decade, a number of efficient algorithms for frequent item set mining have been presented, but most of them focused on just enumerating the item set patterns which satisfy the given conditions, and it was a different matter how to store and index the result of patterns for efficient data analysis. Recently, we proposed a fast algorithm of extracting all frequent item set patterns from transaction databases and simultaneously indexing the result of huge patterns using Zero-suppressed BDDs (ZBDDs). That method, ZBDD-growth, is not only enumerating/listing the patterns efficiently, but also indexing the output data compactly on the memory to be analyzed with various algebraic operations. In this paper, we present a variation of ZBDD-growth algorithm to generate frequent closed item sets. This is a quite simple modification of ZBDD-growth, and additional computation cost is relatively small compared with the original algorithm for generating all patterns. Our method can conveniently be utilized in the environment of ZBDD-based pattern indexing.

View full abstract

Download PDF (760K)
Mining Classification Rules from Multidimensional Structured Databases

Tomonobu Ozaki, Tomoki Watanuma, Takenao Ohkawa

2007 Volume 22 Issue 2 Pages 173-182
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.173

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently, the research area of mining in structured data has been actively studied. However, since most techniques for structured data mining so far specialize in mining from single structured data, it is difficult for these techniques to handle more realistic data which is related to various types of attribute and which consists of plural kinds of structured data. Since such kind of data is expected to be going to rapidly increase, we need to establish a flexible and highly accurate technique that can inclusively treat such kind of data. In this paper, as one of the techniques to deal with such kind of data, we propose data mining algorithms of mining classification rules in multidimensional structured data. First, an algorithm with two pruning capabilities of mining correlated patterns is introduced. Then, top-k multidimensional correlated patterns are discovered by using this algorithm repeatedly in the fashion like a beam search. We also show the algorithms for constructing classifiers based on the discovered patterns. Experiments with real world data were conducted to assess the effectiveness of the proposed algorithms. The results show that the proposed algorithms can construct comprehensible and accurate classifiers within a reasonable running time.

View full abstract

Download PDF (264K)
Casualty Insurance Pure Premium Estimation Using Two-Stage Regression Tree

Kumiko Nishi, Ichiro Takeuchi

2007 Volume 22 Issue 2 Pages 183-190
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.183

JOURNAL FREE ACCESS

Show abstractHide abstract

We study a regression tree algorithm tailored to casualty insurance pure premium estimation problems. Casualty insurance premium is mainly determined by the expected amount that the insurance companies have to pay for the contract. Therefore, casualy insurance companies have to estimate the expected insurance amount on the basis of insurance risk factors. This problem is formulated as a regression problem, i.e. estimation of conditional mean E[Y|x], where Y is insurance amounts and x is risk factors. In this paper, we aim to implement the regression problem in regression tree framework. The difficulty of the problem lies in the fact that the distribution of insurance amount P(Y|x) is highly skewed and exhibits a long-tail toward positive direction. Conventional least-square-error regression tree algorithm is notoriously unstable under such long-tailed error distribution. On the other hand, several types of robust regression trees, such as least-absolute-error regression tree, are neither appropriate in this situation because they yields significant bias to conditional mean E[Y|x]. In this paper, we propose a two-stage tree fitting algorithm. In the first stage, the algorithm constructs a quantile tree, a kind of robust regression tree, which is stable but biased to conditional mean E[Y|x]. In the second stage, the algorithm corrects the bias using least-square error regression tree. We discuss the theoretical background of the algorithm and empirically investigate the performances. We applied the proposed algorithm to a car insurance data set of 318,564 records provided from a north-american insurance company and obtained significantly better results than conventional regression tree algorithm.

View full abstract

Download PDF (307K)
Fast Reachability Test on DAGs for XML

Yusaku Nakamura, Tetsuya Maita, Hiroshi Sakamoto

2007 Volume 22 Issue 2 Pages 191-199
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.191

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose an efficient algorithm for deciding the reachability between any nodes on XML data represented by connected directed graphs. We develop a technique to reduce the size of the reference table for the reachability test. Using the small table and the standard range labeling method for rooted ordered trees, we show that our algorithm answers almost queries in a constant time preserving the space efficiency and a reasonable preprocessing time.

View full abstract

Download PDF (620K)
A Knowledge Discovery from POS Data using State Space Models

An Analysis of Change in Price Elasticities by New Product's Entry to Market

Tadahiko SATO, Tomoyuki HIGUCHI

2007 Volume 22 Issue 2 Pages 200-208
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.200

JOURNAL FREE ACCESS

Show abstractHide abstract

The number of competing-brands changes by new product's entry. The new product introduction is endemic among consumer packaged goods firm and is an integral component of their marketing strategy. As a new product's entry affects markets, there is a pressing need to develop market response model that can adapt to such changes. In this paper, we develop a dynamic model that capture the underlying evolution of the buying behavior associated with the new product. This extends an application of a dynamic linear model, which is used by a number of time series analyses, by allowing the observed dimension to change at some point in time. Our model copes with a problem that dynamic environments entail: changes in parameter over time and changes in the observed dimension. We formulate the model with framework of a state space model. We realize an estimation of the model using modified Kalman filter/fixed interval smoother. We find that new product's entry (1) decreases brand differentiation for existing brands, as indicated by decreasing difference between cross-price elasticities; (2) decreases commodity power for existing brands, as indicated by decreasing trend; and (3) decreases the effect of discount for existing brands, as indicated by a decrease in the magnitude of own-brand price elasticities. The proposed framework is directly applicable to other fields in which the observed dimension might be change, such as economic, bioinformatics, and so forth.

View full abstract

Download PDF (630K)
A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction

Hisashi Kashima, Naoki Abe

2007 Volume 22 Issue 2 Pages 209-217
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.209

JOURNAL FREE ACCESS

Show abstractHide abstract

We introduce a new approach to the problem of link prediction for network structured domains, such as the Web, social networks, and biological networks. Our approach is based on the topological features of network structures, not on the node features. We present a novel parameterized probabilistic model of network evolution and derive an efficient incremental learning algorithm for such models, which is then used to predict links among the nodes. We show some promising experimental results using biological network data sets.

View full abstract

Download PDF (372K)
Discovering Concepts from Word Co-occurrences with a Relational Model

Kenichi Kurihara, Yoshitaka Kameya, Taisuke Sato

2007 Volume 22 Issue 2 Pages 218-226
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.218

JOURNAL FREE ACCESS

Show abstractHide abstract

Clustering word co-occurrences has been studied to discover clusters as latent concepts. Previous work has applied the semantic aggregate model (SAM), and reports that discovered clusters seem semantically significant. The SAM assumes a co-occurrence arises from one latent concept. This assumption seems moderately natural. However, to analyze latent concepts more deeply, the assumption may be too restrictive. We propose to make clusters for each part of speech from co-occurrence data. For example, we make adjective clusters and noun clusters from adjective--noun co-occurrences while the SAM builds clusters of ``co-occurrences.'' The proposed approach allows us to analyze adjectives and nouns independently.

View full abstract

Download PDF (1284K)
Opinion Mining from Web Documents: Extraction and Structurization

Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto

2007 Volume 22 Issue 2 Pages 227-238
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.227

JOURNAL FREE ACCESS

Show abstractHide abstract

The task of opinion extraction and structurization is the key component of opinion mining, which allow Web users to retrieve and summarize people's opinions scattered over the Internet. Our aim is to develop a method for extracting opinions that represent evaluation of concumer products in a structured form. To achieve the goal, we need to consider some issues that are relevant to the extraction task: How the task of opinion extraction and structurization should be designed, and how to extract the opinions which we defined. We define an opinion unit consisting of a quadruple, that is, the opinion holder, the subject being evaluated, the part or the attribute in which it is evaluated, and the evaluation that expresses positive or negative assessment. In this task, we focus on two subtasks (a) extracting subject/aspect-evaluation relations, and (b) extracting subject/aspect-aspect relations, we approach each extraction task using a machine learning-based method. In this paper, we discuss how customer reviews in web documents can be best structured. We also report on the results of our experiments and discuss future directions.

View full abstract

Download PDF (325K)
Analysis Framework for Persuading Process and an Application to Debt-Collecting Conversation Logs

Wataru Sunayama, Katsutoshi Yada

2007 Volume 22 Issue 2 Pages 239-247
Published: 2007
Released on J-STAGE: January 25, 2007

DOIhttps://doi.org/10.1527/tjsai.22.239

JOURNAL FREE ACCESS

Show abstractHide abstract

The purpose of this research is to develop a framework to analyze the content and a process of persuading process and its application to the communication for the debt-collecting process. It is possible for us to understand how the skilled workers have used the keyword groups concerning the motivation to pay, the payment methods and the payment confirmation in their conversations, to model a persuading process. There is no research and method to deal with a large amount of conversation logs for discovering useful knowledge about a persuading process. In this paper, we were successful in discovering a part of the distinctive features of skilled workers in their conversations for the overdue payment collection, applying our method to communication data in a Japanese telecommunications company.

View full abstract

Download PDF (423K)

Register with J-STAGE for free!