Mathematical Linguistics
Online ISSN : 2433-0302
Print ISSN : 0453-4611
Invited Paper (B) to the Special Issue
Author Identification Based on Rate of Usage of Words before Period
Multivariate Analyses and Scoring
Wataru ZaitsuMingzhe Jin
Author information
JOURNAL OPEN ACCESS

2018 Volume 31 Issue 6 Pages 417-425

Details
Abstract

This study examined the effectiveness of author identification on the basis of the rate of usage of words before periods across texts by 100 bloggers. We analyzed one suspected text, one control text, and irrelevant texts by four bloggers using four multivariate analyses: (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis, and we gave scores based on the results of the multivariate analyses. This study set two conditions: “same author: the author of suspected and control texts were same” and “different author: the author of suspected and control texts were different”. The results of comparing score distributions between both groups indicated that the rate of usage of words before periods was effective for author identification, next to rate of usage of non-independent words and bigram of parts of speech.

Content from these authors
© Mathematical Linguistic Society of Japan

この記事はクリエイティブ・コモンズ [表示 - 非営利 - 改変禁止 4.0 国際]ライセンスの下に提供されています。
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ja
Previous article Next article
feedback
Top