Stylometrics analyzes the style of texts based on some metric features. These methods have been mainly applied to modern Japanese texts, and shown its effectiveness especially for authorship attribution. However, when these methods are applied to classical literature texts, existence of variants for the same work causes problems because there are many variants for them, which rarely have an original text, and sometimes these variants are greatly different from the original one. This paper validates a method that represents a relationship between variants quantitatively, using edit distance or perplexity. Experiments on “
Izumishikibu nikki”, which is one of the most popular diary works in the
Heian period, shows that the proposed method has a better correspondence to the results shown in the previous bibliographical studies, compared to the conventional principal component analysis using multiple metric features. Furthermore, comparison with “
Sarashina nikki”, which is another diary work in the Heian period, confirms that the difference between variants for the same work is much smaller than that between different works.
View full abstract