人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
blogの自動収集と監視
南野 朋之鈴木 泰裕藤木 稔明奥村 学
著者情報
ジャーナル フリー

2004 年 19 巻 6 号 p. 511-520

詳細
抄録

Weblogs (blogs) are now thought of as a potentially useful information source. Although the definition of blogs is not necessarily definite, it is generally understood that they are personal web pages authored by a single individual and made up of a sequence of dated entries of the author's thoughts, that are arranged chronologically. In Japan, since long before blog software became available, people have written `diaries' on the web. These web diaries are quite similar to blogs in their content, and people still write them without any blog software. As we will show, hand-edited blogs are quite numerous in Japan, though most people now think of blogs as pages usually published using one of the variants of public-domain blog software. Therefore, it is quite difficult to exhaustively collect Japanese blogs, i.e., collect blogs made with blog software and web diaries written as normal web pages. With this as the motivation for our work, we present a system that tries to automatically collect and monitor Japanese blog collections that include not only ones made with blog software but also ones written as normal web pages. Our approach is based on extraction of date expressions and analysis of HTML documents, to avoid having to depend on specific blog software, RSS, or the ping server.

著者関連情報
© 2004 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top