2018 Volume 31 Issue 6 Pages 417-425
This study examined the effectiveness of author identification on the basis of the rate of usage of words before periods across texts by 100 bloggers. We analyzed one suspected text, one control text, and irrelevant texts by four bloggers using four multivariate analyses: (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis, and we gave scores based on the results of the multivariate analyses. This study set two conditions: “same author: the author of suspected and control texts were same” and “different author: the author of suspected and control texts were different”. The results of comparing score distributions between both groups indicated that the rate of usage of words before periods was effective for author identification, next to rate of usage of non-independent words and bigram of parts of speech.