経時的に観測されたテキストデータに対する変化係数モデルに基づく統計的な分類方法と視覚化について

和泉 志津恵; 佐藤 健一; 川野 徳幸

doi:10.20551/jscswabun.28.1_81

Abstract

Lately written texts to social networking services like Twitter and Facebook are attracted to attention as big data. And these texts can be treated as longitudinally observed text data. Extraction of the longitudinal trends of keyword appearance and its classification can summarize the changes of characteristics in longitudinal text data. We propose a analytical method of the longitudinally observed text data, with an application of the method of estimating semiparametric varying coefficients using a mixed effects model proposed by Satoh and Tonda (2013). Our method consists of series of analytical methods, estimating the probability of keyword appearance using a logistic regression for the keyword appearance in the longitudinally observed text data, and classifying and visualizing the longitudinal trends of keyword appearance using summary of predictors. Results from the analysis of Hiroshima Peace Declaration enabled us to describe the longitudinal trends of keyword appearance in the text data. And the time affected classification results and the keyword location are visualized in a two-dimensional scatter plot, which provided additional information on the analogy between two classifications and the degree of intimacy with keywords. Further some practical interpretations of the classified results with consideration of social background implied an appropriateness of our proposal.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!