人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
情報量と頻度に基づく非同期かつ有用な系列パターンの高速抽出
村田 順平岩沼 宏治大塚 尚貴
著者情報
ジャーナル フリー

2010 年 25 巻 3 号 p. 464-474

詳細
抄録

In this paper, we propose new methods and gave a system, called IFMAP , for extracting interesting patterns from a long sequential data based on frequency and self-information, and experimentally evaluate the proposed methods in the application of handling a newspaper article corpus.
Sequential data mining methods based on frequency have intensively beenstudied so far. These methods, however, are not effective nor valuable for some applications where almost all high-frequent patterns should beregarded just as meaningless noisy patterns.
An information-gain concept is quite important in order to restrain these noisy patterns, and was already studied for integrating it with a frequency criteria. Yang et.~al. gave a sequential mining system InfoMiner which can find periodic synchronous patterns being interesting and well-balanced from the both view-points of frequency and self-information.
In this paper, we refine and extend the InfoMiner technologies in the following points: firstly, our method can handle ordinary, i.e., asynchronous and non-periodic patterns by using a sliding window mechanism, whereas InfoMiner cannot; secondly we give several combination measures for choosing valuable patterns based on frequency and self-information, while InfoMiner has just one measure which, we show in this paper, is not appropriate nor effective for handling newspaper article corpora; thirdly, we proposed a new unified method for pruning the search space of sequential data mining, which can uniformally be applied to any combination measures proposed here.
We conduct experiments for evaluating the effectiveness and efficiency of the proposed method with respect to the runtime and the amount of excluding noisy patterns.

著者関連情報
© 2010 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top