行動計量学
Online ISSN : 1880-4705
Print ISSN : 0385-5481
ISSN-L : 0385-5481
原著
統合的分類アルゴリズムを用いた文章の書き手の識別
金 明哲
著者情報
ジャーナル フリー

2014 年 41 巻 1 号 p. 35-46

詳細
抄録
Text classification results often vary depending on the detailed factors in data analysis, including feature data, classification method, and parameter sets adopted in the analysis. The author of an anonymous text can be generally identified by extracting a set of distinctive features of the text, and then using the features to find the most likely author. Numerous efforts have been made to develop the feature extraction technique with more robustness and the classification algorithm, but an important issue is how to select the features datasets and classification method. To address this issue, we propose an integrated classification algorithm that extracts multiple feature datasets from differing viewpoints and aspects of a text and applies multiple strong classifiers to the datasets. Our proposed method achieved 100% accuracy in identifying the authors of literary works and student essays, and identified the author of all but 1 out of 60 diaries which were written by 6 different people.Our proposed method achieved equivalent or better accuracy than the case when any a strong classifier applied to individual feature dataset. Furthermore, the accuracy in identifying the authors of student essays increased by roughly two percentage points.
著者関連情報
© 2014 日本行動計量学会
前の記事 次の記事
feedback
Top