Abstract
Time-series data of keywords within blogs, news, and spam is analyzed in terms of auto-correlation to find periodic topics in these information sources. To find differences among the three sources, an algorithm is developed to find periodic topics based on auto-correlation. Employing this algorithm, distribution periods of keywords within each information source, weekly keywords, and yearly keywords are extracted. In terms of distribution and keywords, characteristics of information sources are illustrated. According to the results, periodic blog topics are TV programs, hobbies, and social events. Periodic news topics are political and economical events. Periodic topics in spam are automatically copied-and-pasted email newsletters and affiliates.