Kodo Keiryogaku (The Japanese Journal of Behaviormetrics)
Online ISSN : 1880-4705
Print ISSN : 0385-5481
ISSN-L : 0385-5481
Articles
Accuracy and Standardized Judgment Procedures for Author Identification by Text Mining
Wataru ZaitsuMingzhe Jin
Author information
JOURNAL FREE ACCESS

2018 Volume 45 Issue 1 Pages 39-47

Details
Abstract

This study examined the accuracy for author identification by text mining. We conducted 16 analyses (four writing styles × four multivariate analyses) across texts of 100 Bloggers, written by approximately 1,000 characters. Specifically, we conducted (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis on each writing style: (1) rate of usage of non-independent words, (2) bigram of parts-of-speech, (3) bigram of postpositional particles, and (4) positioning of commas. We obtained high accuracy: 100% on sensitivity and 95.1% on specificity. Furthermore, the results showed no effects of age and gender against accuracy for author identification.

Content from these authors
© 2018 The Behaviormetric Society
Previous article Next article
feedback
Top