Nonlinear Theory and Its Applications, IEICE
Online ISSN : 2185-4106
ISSN-L : 2185-4106
Special Issue on Recent Progress in Nonlinear Theory and Its Applications
Genre classification ability of modern Japanese literary works based on word vectors trained on corpora of various sizes
Katsumi HamaguchiYuya MatsudaJousuke Kuroiwa
著者情報
ジャーナル オープンアクセス

2025 年 16 巻 3 号 p. 681-690

詳細
抄録

In the present paper, we investigate corpus size dependency on genre classification of modern Japanese literary works in CBOW model. In the construction of word vectors, models trained on large sentences could be more accurate in the semantic representation of words than models with less one. Eventually, more accurate semantic representation of words could realize more accurate genre classification accuracy of modern Japanese literary works. Therefore, the purpose of the present paper is to investigate corpus size dependency on genre classification of modern Japanese literary works in CBOW model. In computer experiments, we perform two types of classification problem: novel and poetry, and novel and essay. In either problem, the word vector representation presented by the CBOW model with the large corpus size is the worst classification accuracy contrary to our expectation. Thus, the variety of Japanese word corpus makes disappear the characteristic features of modern Japanese literary words in the semantic representation.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
前の記事 次の記事
feedback
Top