2022 Volume 11 Issue 1 Pages 1-14
Authorship attribution is a branch of text classification that can be used to identify the author of a text from a set of possible candidates. Orthodox authorship attribution studies usually utilize texts pertaining to a single genre as the target corpus, but this tacit requirement is often not satisfied in real-world scenarios. To address this issue, we explored the possibility of using a multi-genre mingled corpus for authorship attribution in this paper. In particular, we selected fictional pieces and essays of five Japanese writers and identified their authors based on combinations of 14 features and at most seven classifiers while varying the number of possible candidates. Further, we evaluated the susceptibilities of these combinations in a more complicated scenario in which two writers have works in multiple genres. The experimental results demonstrate that a multi-genre mingled corpus is suitable for authorship attribution and that a satisfactory attribution performance can be achieved using appropriately chosen features and classifiers.