Doc2VecとBERTを用いた日本語作品の著者推定

清水 大志

doi:10.11517/pjsai.JSAI2020.0_3Rin478

Abstract

There has been much research on author identification based on a text for a long time. In Japanese texts, many researchers have taken various methods that focus on features such as the distribution of n-grams of parts of speech and the distribution of characters. They also used various models such as random forest method and neural network as classification models. In this paper, I focused on Doc2Vec proposed in 2014 and BERT in 2018 and performed supervised learning using these models and neural networks. I downloaded these works used as training and test data from "Aozora Bunko" and converted them into a numerical vector using Doc2Vec and use it as the input of the neural network. I performed Multinomial classification learning and got results in the accuracy of 84.89% for Doc2Vec and 55.43% for BERT.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!