Abstract
The authors examine the impressions people gain from reading newspaper articles and propose a method for quantifying the strength of these impressions. Our target impressions are limited to those represented by three bipolar scales, “Happy-Sad,” “Glad-Angry,” and “Peaceful-Strained.” In order to compute the strength of each impression as an “impression value,” that is, a real number between 1 and 7, it is generally required to quantify the power of features extracted from articles to influence on their impressions and record it in an impression lexicon. An impression lexicon is usually constructed for each kind of impression and is used only to compute an impression value for an impression. In this study, we focus on the fact that the impressions are not independent of one another and adopt a new approach that recalculates each value with the values that were computed using each impression lexicon. First, we define “word unigrams” and “word bigrams” as features extracted from articles and construct a total of six impression lexicons (i.e., one lexicon for each of the three kinds of impressions multiplied by two kinds of features) from a newspaper database. Next, we apply multiple regression analysis for each impression, where the values computed from articles using each lexicon are used as one of the explanatory variables, and an average of the values that respondents used to rate each article using the corresponding impression scale in questionnaire surveys is used as the objective variable. Consequently, a multiple regression equation is obtained for each impression, which represents a correspondence relationship between the variables. We also perform five-fold cross-validation using the data obtained in the surveys to verify the effectiveness of the proposed method. The results show that the average root-mean-square errors for unlearned data are 0.68, 0.57, and 0.62 for respective impressions.