2009 年 2009 巻 DMSM-A803 号 p. 11-
We propose an on-line approximation high-speed algorithm for extracting frequents ubsequences from a stream data. In an on-line algorithm, suppressing memory consumption is very important, thus, an on-line algorithm often takes a form of approximation algorithm, where the error ratio should be guaranteed to be lower than a user-specified threshold value. Our Algorithm is based on LOSSY_COUNTING Algorithm[1, 4], which is famous and can extracts frequent items from a stream data. We extend LOSSY COUNTING Algorithm to extract frequent subsequences from a stream data by using Head Frequency that is a measure for frequency of subsequences. We estimate approximation accuracy of the proposed algorithm and the space complexity. The order of memory consumption is M/ϵ logN, where M is the maximum number of subsequences obtained in each window, ϵ is an user-specified error ratio, and N is the length of a stream data. Through experiments we show that the proposed algorithm has good scalability to the length of a stream data and can suppress the memory consumption being lower than the estimated value