Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3Rin4-78
Conference information

Author Identification of Japanese works using Doc2Vec and BERT
*Taishi SHIMIZU
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

There has been much research on author identification based on a text for a long time. In Japanese texts, many researchers have taken various methods that focus on features such as the distribution of n-grams of parts of speech and the distribution of characters. They also used various models such as random forest method and neural network as classification models. In this paper, I focused on Doc2Vec proposed in 2014 and BERT in 2018 and performed supervised learning using these models and neural networks. I downloaded these works used as training and test data from "Aozora Bunko" and converted them into a numerical vector using Doc2Vec and use it as the input of the neural network. I performed Multinomial classification learning and got results in the accuracy of 84.89% for Doc2Vec and 55.43% for BERT.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top