Abstract
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address an issue of detecting influenza epidemics. Although previous methods rely mainly on the frequencies of the influenza related words, such methods had suffered from the noisy tweets that do not express influenza symptoms. To deal with this problem, this study proposed two methods. First, the sentence classifier judges whether a person really catches the influenza or not. Next, the infectious model closes a time gap between the people web activity and the illness period. In the experiments, the combination of two techniques achieved the high performance (correlation coefficient 0.910 to the number of the influenza patients). This result suggests that not only natural language processing but also disease study contributes to social media based surveillance.