2016 Volume 5 Issue 1 Pages 49-66
There have been a number of researches to predict future events with information sourceson the internet, and they mostly utilize single or few information sources for prediction. Onereason of the small number of information sources is because it is not necessarily appropriateto assume that plural information sources have identical features in terms of contentsand that prohibits dealing all the information sources equally. In order to utilize variety ofinformation sources on the internet as possible, it would be essential to extract differentialfeatures of information sources, and evaluate how they relate one another. This paper assumesthat news from major Japanese newspaper publishers represent business facts, andblog and message board (hereafter ’social media’) contain evaluation of business facts. Itthen introduces analytical framework to extract differential features of each of the informationsources. Specifically, this paper hypothesizes that differential features are added in thesocial media’s news quotation onto original news, and discusses how to extract the featuresin a numerical manner. It also discusses features can be at the level of information sourcesper se as well as at more granular taxonomy level. Entropy term-weighting and Naive-Bayesclassifier are adopted to categorize contents and singular vector decomposition is adoptedto extract the differential features. These procedures are applied to the data relating toToyota motors, and result of the analysis not only supports the hypothesis but also exemplifieshow differential features at granular taxonomy level represent the company’s currentcharacteristics.