Bulletin of Data Analysis of Japanese Classification Society
Online ISSN : 2434-3382
Print ISSN : 2186-4195
Article
Authorship Attribution in the Multi-genre Mingled Corpus
Yejia LiuMingzhe Jin
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2022 Volume 11 Issue 1 Pages 1-14

Details
Abstract

Authorship attribution is a branch of text classification that can be used to identify the author of a text from a set of possible candidates. Orthodox authorship attribution studies usually utilize texts pertaining to a single genre as the target corpus, but this tacit requirement is often not satisfied in real-world scenarios. To address this issue, we explored the possibility of using a multi-genre mingled corpus for authorship attribution in this paper. In particular, we selected fictional pieces and essays of five Japanese writers and identified their authors based on combinations of 14 features and at most seven classifiers while varying the number of possible candidates. Further, we evaluated the susceptibilities of these combinations in a more complicated scenario in which two writers have works in multiple genres. The experimental results demonstrate that a multi-genre mingled corpus is suitable for authorship attribution and that a satisfactory attribution performance can be achieved using appropriately chosen features and classifiers.

Content from these authors
© 2022 Japanese Classification Society
Next article
feedback
Top