Kodo Keiryogaku (The Japanese Journal of Behaviormetrics)
Online ISSN : 1880-4705
Print ISSN : 0385-5481
ISSN-L : 0385-5481
Articles
Using Integrated Classification Algorithm to Identify a Text's Author
Mingzhe Jin
Author information
JOURNAL FREE ACCESS

2014 Volume 41 Issue 1 Pages 35-46

Details
Abstract

Text classification results often vary depending on the detailed factors in data analysis, including feature data, classification method, and parameter sets adopted in the analysis. The author of an anonymous text can be generally identified by extracting a set of distinctive features of the text, and then using the features to find the most likely author. Numerous efforts have been made to develop the feature extraction technique with more robustness and the classification algorithm, but an important issue is how to select the features datasets and classification method. To address this issue, we propose an integrated classification algorithm that extracts multiple feature datasets from differing viewpoints and aspects of a text and applies multiple strong classifiers to the datasets. Our proposed method achieved 100% accuracy in identifying the authors of literary works and student essays, and identified the author of all but 1 out of 60 diaries which were written by 6 different people.Our proposed method achieved equivalent or better accuracy than the case when any a strong classifier applied to individual feature dataset. Furthermore, the accuracy in identifying the authors of student essays increased by roughly two percentage points.

Content from these authors
© 2014 The Behaviormetric Society
Previous article Next article
feedback
Top