人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
速報論文
イベント系列マイニングを目的とする新聞記事からの時間情報に基づく単語抽出
多田 知道岩沼 宏治鍋島 英知
著者情報
ジャーナル フリー

2009 年 24 巻 6 号 p. 488-493

詳細
抄録

This paper shows a new method of extracting important words from newspaper articles based on time-sequence information. This word extraction method plays an important role in event sequence mining. TF-IDF is a well-known method to rank word's importance in a document. However, the TF-IDF method never consider the time information embedded in sequential textual data, which is peculiar to newspapers. In this research, we will propose a new word-extraction method, called the TF-IDayF method, which considers time-sequence information, and can extract important/characteristic words expressing sequential events. The TF-IDayF method never use so-called burst phenomenon of topic word occurrences, which has been studied by lots of researchers. The TF-IDayF method is quite simple, but effective and easy to compute in sequential textual mining. We evaluate the proposed method from three points of view, i.e., a semantic viewpoint, a statistical one and a data mining viewpoint through several experiments.

著者関連情報
© 2009 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top